Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use StringBuilder for char* used as output #88

Closed
mardukbp opened this issue May 26, 2022 · 10 comments
Closed

How to use StringBuilder for char* used as output #88

mardukbp opened this issue May 26, 2022 · 10 comments

Comments

@mardukbp
Copy link

I generated bindings for a closed-source C API, that contains many functions of the form

int getSomething(char* input, char *output)

In the generated C# code these functions are mapped to

int getSomething(CString input, CString output)

The problem is, the C code expects output to be a pointer to a buffer of a given size. This can be accomplished by marshalling a StringBuilder.

Is it possible to map char * to StringBuilder and char* to CString? What other solution do you propose?

@lithiumtoast
Copy link
Member

Is char* really a "C style string" in your API? Are we talking about a data structure of "8-bit integers followed by a null character (0x0)? I ask because you say it's a buffer of a given size.

@mardukbp
Copy link
Author

Yes, they are null-terminated strings. I apologize for the lack of precision of my words. I still have a lot to learn. I checked the code again. Basically it copies a char* into char *output using strcopy. So I guess what is needed is a writeable reference to a large enough chunk of memory. How can this be accomplished?

@lithiumtoast
Copy link
Member

You can alloc a buffer large enough (either on the stack or heap) and create a CString from the pointer.

Span<byte> buffer = stackalloc[1024];
var cString = new CString(buffer);
my_function(cString);
var outputString = Marshal.PtrToStringAnsi(cString.ToIntPtr());

@mardukbp
Copy link
Author

Perfect! Thank you. I ended up allocating a managed buffer in order to avoid using unsafe blocks. I also added a new constructor for CString:

public CString(byte[] buffer)
{   
    fixed (byte *ptr = buffer)
    {
        _pointer = (nint)ptr;
    }
}

Would you consider adding it to the generated bindings?

@lithiumtoast
Copy link
Member

The fixed statement is only valid for the stack. A GCHandle would need to be used to prevent the byte[] from moving around in memory to get a fixed pointer address to it. I think that's a bit outside the scope of CString since it's suppose to just be a blittable type to char*.

Instead I would see possible opening up the conversation of using MarshalAs(UnmanagedType.LPStr) attribute as an option #74.

int getSomething(char* input, char *output)
int getSomething([MarshalAs(UnmanagedType.LPStr)] string input, MarshalAs(UnmanagedType.LPStr) out string output)

@mardukbp
Copy link
Author

My understanding from the documentation is that the fixed statement pins a managed variable so that it is not moved by garbage collection. And Microsoft recommends using it for creating fixed-size buffers. I called System.GC.Collect() directly after the constructor of CString and also before the call to the foreign function and the program worked as expected.

I believe that Rust's CString is similar in spirit to the CString of c2cs. Passing a pointer to a constant-size buffer looks like this:

let v = vec![0; size];
let s = CString::from_vec_unchecked(v);
let ptr = s.into_raw();
let err = call_to_ffi(ptr, size);
assert_eq!(err, 0);
let s = CString::from_raw(ptr);

As I see it, the _pointer field of a c2cs CString may be a read-only pointer (like Rust's CString as_ptr) or a mutable pointer (like Rust's CString into_raw). Which means that one can indicate the intent of the CString in the constructor. A CString from a string is for passing data to C, whereas a CString from a byte array is for reading data from C.

Unless there is a technical reason why this may not work in general, I would stick to using the CString abstraction in the generated bindings.

@lithiumtoast
Copy link
Member

My understanding from the documentation is that the fixed statement pins a managed variable so that it is not moved by garbage collection.

Yes, for the scope of the fixed statement. The pointer value escapes the fixed statement; this can lead to the pointer value not pointing to what you think it's pointing to if your array changes it's location in memory (which the garbage collector is allowed to do). What you are doing is okay as long you throw away the pointer value as soon as you use it (don't store it or use it later).

@mardukbp
Copy link
Author

You are right. I guess the right way to do it would be

string get_info() {
  unsafe {
    fixed (byte *ptr = byte[1024]) {
      var cstring = new CString(ptr);
      c_get_info(cstring);
      return cstring.ToString();
    }
  }
}

which is very verbose.

The Mono Project recommends using a StringBuilder as the simplest solution, since the runtime takes care of the marshaling.

I do not think it is a good idea to use MarshalAs(UnmanagedType.LPStr) out string output since strings in C# are immutable and therefore you would not pass a string when the intent is for it to be modified.

@lithiumtoast
Copy link
Member

lithiumtoast commented May 31, 2022

  1. Careful about CString.ToString, it calls CStrings.String(value) which keeps track of the C char* and C# String for caching purposes.
  2. I would rather keep things "explicit" by not using default marshalling by the runtime, especially for return types of a C# class. A bit of "verbose" code is better for understandability than what most people are not familiar with or lack understanding of default marshalling. This is also why there is a strict policy of "only blittable types" for C2CS; there is no hidden control flow or allocations which leads to simplicity.

@lithiumtoast
Copy link
Member

I'm going to close this as a solution was provided that appears to be working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants