Unbounded strings direct access library for Ada
Breaking bad with Ada 2012 references
Borrowing pointers from Unbounded_String
Constructing fat pointers
Rev. | 日時 | 作者 | メッセージ |
---|---|---|---|
f841794ddb83 | 2018-06-19 09:32:06 | Ivan Levashev 卜根 <bu_ | tip Unbounded_String to fixed String bridge |
名前 | Rev. | 日時 | 作者 |
---|---|---|---|
tip | f841794ddb83 | 2018-06-19 09:32:06 | Ivan Levashev 卜根 <bu_ |
名前 | Rev. | 日時 | 作者 | メッセージ |
---|---|---|---|---|
default | f841794ddb83 | 2018-06-19 09:32:06 | Ivan Levashev 卜根 <bu_ | Unbounded_String to fixed S... |
Unbounded strings direct access library for Ada Breaking bad with Ada 2012 references Borrowing pointers from Unbounded_String Constructing fat pointers Motivation: I was highly disappointed with how Ada (different) unbounded strings work. I could take standard Ada.Strings.Unbounded.Unbounded_String, but it lacks features I am missing. I could use some 3rd party implementation like League.Strings.Universal_String from Matreshka, but then all other routines have to be taken from the same library. Random other library will not be aware of it, and I'll have to convert (that is, copy in memory) data all the time. Yet another option is to write my own proper string, but then there will be no libraries at all. So I came to a simple idea. It is better to just fix The Standard's strings than come with another solution. First, fixed strings are more widely recognized than unbounded ones, so as opposed to writing a library like League having all I need, I just create a quick path between fixed strings and unbounded ones. Second, I bind to The Standard's unbounded strings, so I get interoperability with many existing libraries. I observe that it solves my problems very well. One particular problem I was solving when I thought I had enough of it, was reading JSON file. I can open file using Stream_IO, get Stream_Access, but what's next? For years I used String'Read and using GNATCOLL.JSON.Read that takes String, but it is not scalable. Short JSON files could be read, and there is no enough stack for big files. If I try to 'Read to Unbounded_String directly, behind the scenes it uses To_Unbounded_String, and it means lots of stack usage and copying data. Solution that is good enough would probably be to read file in small chunks and append them to Unbounded_String one by one. This has to be programmed, I need to pick appropriate chunk size, and there is still redundant copy operations. I wanted to have another semantics. Prellocate Unbounded_String big enough using To_Unbounded_String (Length : in Natural), then "mount" it as fixed read-write string, populate it with String'Read in one hop, then "unmount", then feed directly into GNATCOLL.JSON.Read. I am aware of Ada.Strings.Unbounded.Aux, but it is not quite correct. It is suitable for reading contents, but not editing, there is no way to enforce string to be unique. Also, that Big_String stuff is ugly. I wanted to create fat pointers to fixed strings with proper length that would be natural to pass anywhere expecting fixed string arguments. So I implemented safe ways to access unbounded string contents. One can either mount string in read-only mode or read-write mode. In read-only mode the unbounded string becomes immutable by having its shadow copy inside the Ada 2012 reference. Other unbounded string owners won't modify this string, and Ada 2012 reference is "not null access constant" mode, so modification is not possible via read-only reference too. Since it's a reference, one can pass it to any function expecting fixed string. To_String_View serves as To_String replacement that eliminates data copy while still providing correct operation: Ada.Text_IO.Put_Line (To_String (Value_1)); -- copies data Ada.Text_IO.Put_Line (To_String_View (Value_1)); -- does not copy data Another option is to "mount" fixed string reference: declare Value_1_View : String renames To_String_View (Value_1); begin ... end; This is similar to "Value_1_Copy : constant String := To_String (Value_1)", but no additional copies. When explicitly mounting the reference (String_View as opposed to String), one can access the same data as either fixed string or an unbounded one at the same time (!). First, there is no need to keep the original unbounded string value this way. Second, it enables better sharing of unbounded string data. If some procedure is expecting unbounded string argument, it can be provided, and if it stores it somewhere, reference counter will just be incremented, as opposed to complete data copy. Also, there is a read-write mount that works in a completely different way. One needs to declare a local discriminated object that borrows pointer from unbounded string and gives it back on destroy. While this object is alive, the string becomes empty. Actually, it can be modified, but that will be overwritten on commit. Editor exposes read-write reference to fixed string, that is usable wherever fixed strings are appropriate. Shadow unbounded string is not exposed because it cannot be shared while editing is in progress. Exposing it would violate the correct operation of editor. Mere To_Unbounded_String has to be used in contexts where Unbounded_String is expected. Sample editor usage: declare Value_1_Editor : String_Editor (Value_1'Access); Value_1_Edit : String renames Value_1_Editor.Edit; begin Value_1_Edit (2) := 'S'; end; This block enters editing of Value_1 (Value_1 becomes empty string), makes changes using direct read-write fixed string access, and commits changes by leaving. As soon as Value_1 was unique, no additional data copy happened. As far as I understand, that presents program interfaces that makes unbounded strings much more CPU friendly while still enforcing correctness. That library involved lot of hacks that might make it non-portable. However, a portable fallback implementation could be provided. String_View could instead allocate a copy of fixed String and provide a reference to it, and become Controlled to deallocate it. String_Editor could also allocate a copy of fixed String and destroy Target instead of swapping it with shadow copy; and similarly it could commit changes using Set_Unbounded_String on Target as opposed to swapping pointers. That would be the same API, but slower, having to copy data all the time. Thus, Fast_Strings are (in theory) more portable, correct and safe than Ada.Strings.Unbounded.Aux. GNAT.SPITBOL makes heavy use of Ada.Strings.Unbounded.Aux to achieve execution speed, so these tricks are generally fine in real-world code. It is of educational interest. It demonstrates how to tear walls of the standard library, how to construct fat pointers with customized constraints, and advanced hacks using Ada 2012 references. Possible future work: Fallback implementation. Wide & Wide_Wide. Stream_Element_Array, Storage_Array, char_array bridges. AdaMagic compatible Ada 95 port. It is possible to introduce yet another tricky references for interfacing with C code. These references can store shadow (shared) copy of unbounded string, make sure that there is NUL character after the end of string, there is a chance there is a spare room after the "official" end of string, so just write NUL there. Then a special reference can be created, that is discriminated by not null access constant Interfaces.C.Strings.chars_ptr, that is, literally, access to access, but Implicit_Dereference will make this reference look like Interfaces.C.Strings.chars_ptr (1 level of indirection). This way one can write C function invocations without bothering with storing temporary results in declare-begin-end. Ada 2012 references will take care of it. You could have one-liners like this: Check_Error_Code (some_c_function (To_C (GNATCOLL.JSON.Write (...)))); To_C is a function constructing an Ada 2012 reference in question. ============================================================================== The work is produced to accomplish goals of internal development, and further development is driven by internal demands. The code is licensed under the Apache License 2.0 as recommended by FSF. That is friendly to community and business, but not to patent trolls. Ivan Levashev, Barnaul, 2018