• R/O
  • SSH

Fast_Strings: レポジトリ概要

Unbounded strings direct access library for Ada
Breaking bad with Ada 2012 references
Borrowing pointers from Unbounded_String
Constructing fat pointers


最近のコミット RSS

Rev. 日時 作者 メッセージ
f841794ddb83 2018-06-19 09:32:06 Ivan Levashev 卜根 <bu_ tip Unbounded_String to fixed String bridge

最近変更されたタグ

名前 Rev. 日時 作者
tip f841794ddb83 2018-06-19 09:32:06 Ivan Levashev 卜根 <bu_

ブランチ

名前 Rev. 日時 作者 メッセージ
default f841794ddb83 2018-06-19 09:32:06 Ivan Levashev 卜根 <bu_ Unbounded_String to fixed S...

README.txt

Unbounded strings direct access library for Ada
Breaking bad with Ada 2012 references
Borrowing pointers from Unbounded_String
Constructing fat pointers

Motivation:

I was highly disappointed with how Ada (different) unbounded strings work. I 
could take standard Ada.Strings.Unbounded.Unbounded_String, but it lacks 
features I am missing. I could use some 3rd party implementation like 
League.Strings.Universal_String from Matreshka, but then all other routines 
have to be taken from the same library. Random other library will not be aware 
of it, and I'll have to convert (that is, copy in memory) data all the time. 
Yet another option is to write my own proper string, but then there will be no 
libraries at all.

So I came to a simple idea. It is better to just fix The Standard's strings 
than come with another solution. First, fixed strings are more widely 
recognized than unbounded ones, so as opposed to writing a library like 
League having all I need, I just create a quick path between fixed strings 
and unbounded ones. Second, I bind to The Standard's unbounded strings, so 
I get interoperability with many existing libraries. I observe that it solves 
my problems very well.

One particular problem I was solving when I thought I had enough of it, was 
reading JSON file. I can open file using Stream_IO, get Stream_Access, but 
what's next? For years I used String'Read and using GNATCOLL.JSON.Read that 
takes String, but it is not scalable. Short JSON files could be read, and 
there is no enough stack for big files. If I try to 'Read to Unbounded_String 
directly, behind the scenes it uses To_Unbounded_String, and it means lots of 
stack usage and copying data. Solution that is good enough would probably be 
to read file in small chunks and append them to Unbounded_String one by one. 
This has to be programmed, I need to pick appropriate chunk size, and there 
is still redundant copy operations.

I wanted to have another semantics. Prellocate Unbounded_String big enough 
using To_Unbounded_String (Length : in Natural), then "mount" it as fixed 
read-write string, populate it with String'Read in one hop, then "unmount", 
then feed directly into GNATCOLL.JSON.Read.

I am aware of Ada.Strings.Unbounded.Aux, but it is not quite correct. It is 
suitable for reading contents, but not editing, there is no way to enforce 
string to be unique. Also, that Big_String stuff is ugly. I wanted to create 
fat pointers to fixed strings with proper length that would be natural to 
pass anywhere expecting fixed string arguments.

So I implemented safe ways to access unbounded string contents. One can 
either mount string in read-only mode or read-write mode. In read-only mode 
the unbounded string becomes immutable by having its shadow copy inside the 
Ada 2012 reference. Other unbounded string owners won't modify this string, 
and Ada 2012 reference is "not null access constant" mode, so modification is 
not possible via read-only reference too. Since it's a reference, one can pass 
it to any function expecting fixed string. To_String_View serves as To_String 
replacement that eliminates data copy while still providing correct operation:

   Ada.Text_IO.Put_Line (To_String (Value_1));      -- copies data
   Ada.Text_IO.Put_Line (To_String_View (Value_1)); -- does not copy data

Another option is to "mount" fixed string reference:

   declare
      Value_1_View : String renames To_String_View (Value_1);
   begin
      ...
   end;

This is similar to "Value_1_Copy : constant String := To_String (Value_1)", 
but no additional copies. When explicitly mounting the reference (String_View 
as opposed to String), one can access the same data as either fixed string or 
an unbounded one at the same time (!). First, there is no need to keep the 
original unbounded string value this way. Second, it enables better sharing of 
unbounded string data. If some procedure is expecting unbounded string 
argument, it can be provided, and if it stores it somewhere, reference counter 
will just be incremented, as opposed to complete data copy.

Also, there is a read-write mount that works in a completely different way. 
One needs to declare a local discriminated object that borrows pointer from 
unbounded string and gives it back on destroy. While this object is alive, the 
string becomes empty. Actually, it can be modified, but that will be 
overwritten on commit. Editor exposes read-write reference to fixed string, 
that is usable wherever fixed strings are appropriate. Shadow unbounded string 
is not exposed because it cannot be shared while editing is in progress. 
Exposing it would violate the correct operation of editor. Mere 
To_Unbounded_String has to be used in contexts where Unbounded_String is 
expected.

Sample editor usage:

   declare
      Value_1_Editor : String_Editor (Value_1'Access);
      Value_1_Edit : String renames Value_1_Editor.Edit;
   begin
      Value_1_Edit (2) := 'S';
   end;

This block enters editing of Value_1 (Value_1 becomes empty string), makes 
changes using direct read-write fixed string access, and commits changes by 
leaving. As soon as Value_1 was unique, no additional data copy happened.

As far as I understand, that presents program interfaces that makes unbounded 
strings much more CPU friendly while still enforcing correctness.

That library involved lot of hacks that might make it non-portable. However, a 
portable fallback implementation could be provided. String_View could instead 
allocate a copy of fixed String and provide a reference to it, and become 
Controlled to deallocate it. String_Editor could also allocate a copy of fixed 
String and destroy Target instead of swapping it with shadow copy; and 
similarly it could commit changes using Set_Unbounded_String on Target as 
opposed to swapping pointers. That would be the same API, but slower, having 
to copy data all the time.

Thus, Fast_Strings are (in theory) more portable, correct and safe than 
Ada.Strings.Unbounded.Aux. GNAT.SPITBOL makes heavy use of 
Ada.Strings.Unbounded.Aux to achieve execution speed, so these tricks are 
generally fine in real-world code.

It is of educational interest. It demonstrates how to tear walls of the 
standard library, how to construct fat pointers with customized constraints, 
and advanced hacks using Ada 2012 references.

Possible future work:
Fallback implementation.
Wide & Wide_Wide.
Stream_Element_Array, Storage_Array, char_array bridges.
AdaMagic compatible Ada 95 port.

It is possible to introduce yet another tricky references for interfacing 
with C code. These references can store shadow (shared) copy of unbounded 
string, make sure that there is NUL character after the end of string, there 
is a chance there is a spare room after the "official" end of string, so just 
write NUL there. Then a special reference can be created, that is 
discriminated by not null access constant Interfaces.C.Strings.chars_ptr, that 
is, literally, access to access, but Implicit_Dereference will make this 
reference look like Interfaces.C.Strings.chars_ptr (1 level of indirection).

This way one can write C function invocations without bothering with storing 
temporary results in declare-begin-end. Ada 2012 references will take care of 
it. You could have one-liners like this:

   Check_Error_Code (some_c_function (To_C (GNATCOLL.JSON.Write (...))));

To_C is a function constructing an Ada 2012 reference in question.

==============================================================================

The work is produced to accomplish goals of internal development, and further 
development is driven by internal demands. The code is licensed under the 
Apache License 2.0 as recommended by FSF. That is friendly to community and 
business, but not to patent trolls. 


Ivan Levashev,
Barnaul,
2018
旧リポジトリブラウザで表示