DLL Hell, MQL5 edition : UNICODE vs ANSI
[Versiunea romaneasca] [MQLmagazine.com in romana] [English edition]
Many many years ago, when we were kids, in the beginning years of the crazy 90s, two languages were in battle in developer world. Pascal, with a down-to-earth, easy to understand syntax, well suited to a high level language, and C++, with a more cryptic, but faster to use syntax, well suited to its medium level. C++ won the battle, and everything that was done in Windows became compiled in C++ and beared its marks : null-terminated strings and what was known at that time as standard calling convention.
The null-terminated strings were normal strings, known as ANSI strings, but at that time there was no UNICODE yet. Every character was a single byte and the strings had a dynamic lenght, as they were supposed to end with a null (a zero byte). Thus applications were receiving a pointer to indicate were to read these strings from, and knew where the strings end, by looking for a zero byte. As for the standard calling convention, on procedure call, C++ compiler was pushing parameters on the stack starting from the last and finishing with the first.
Null-terminated string (ANSI)
|---------------| |c1|c2|....| 0 | |b1|b2|....|bn+1| |---------------|
Pascal was the absolute reverse of C++ in all these matters. Strings were ANSI too, one byte per each character, but strings had a fixed length of 255 bytes, or compiler defined. They had an extra byte in the front, specifying the logical length of the string (how many bytes were actually used). As for the calling convention, this was perfectly reversed, as in the pascal calling convention parameters were pushed on the stack from the first to the last.
Standard Pascal string (ANSI)
|------------------| |ln|c1|c2|....|c255| |b1|b2|b3|....|b256| |------------------|
This is why Pascal strings could have been sent entirely to functions, without the need to send by reference, which is the unique mode that string sending is possible in C++.
As C++ won the battle, Pascal compiler had to adapt, and calling convention was an easy task. As for the strings, the problem became complicated, as developers had to struggle with PCHAR, a name given to fixed arrays of one byte per element, which were supposed to hold C++ null-terminated strings and were sent by reference.
As these were not enough for developers, the UNICODE standard came in.
UNICODE is a complicated standard, and I don’t know it entirely. The difference from the ANSI is that UNICODE characters are wider, generally they span on two bytes each, but there are also 4-byte character coded strings. In the beginning, UNICODE strings seemed to be something new and awckward. Thus, they were called wide strings. Windows API Functions working with wide strings were had a name terminated with W ; pointers to null-terminated ANSI strings were char* , thus pointers to null-terminated UNICODE had to be called wchar_t* .
Null-terminated strings (UNICODE)
|------------------------------------------| | c1 | c2 |....| cn | 0 | |b1|b2|b3|b4|....|b 2n-1|b 2n|b 2n+1|b 2n+2| |------------------------------------------|
MQL5, as most of the programming environments nowadays, is UNICODE. Even simple strings that you use regularly are still UNICODE. They have an ANSI look, but internal representation is UNICODE. This is because ANSI can be packed in UNICODE, filling unneed bytes with 0.
ANSI packaged in UNICODE (MQL5 normal strings)
|------------------------------------------| | c1 | c2 |....| cn | 0 | |b1|0 |b3|0 |....|b 2n-1| 0 |b 2n+1| 0 | |b1|b2|b3|b4|....|b 2n-1|b 2n|b 2n+1|b 2n+2| |------------------------------------------|
So, in a UNICODE-packed ANSI, every even byte is 0.
But what if you have an older C++ DLL, who uses null-terminated ANSI strings ?
That means it expects and returns null-terminated ANSI strings.
So, if you are to send an “ABC” string to such a DLL, it have mapped in bytes: 65, 0, 66, 0, 67, 0.
The DLL will see the first 0 as the null terminating the string and will understand only “A” from the entire string.
If you are to receive an “ABC” from this type of DLL, you would receive in bytes: 65, 66, 67, 0.
The UNICODE MQL5 will understand first character as 65 and 66 (making something chinese-like), and the second character as 67 and 0, mapping to “C”. Then it will continue reading, if there is no access violation, until it finds 0 and 0, making up for the null, resulting in a complete jabber. The access violation might be avoided because MT5 might allocate enough space for string receival.
Sadly, MQL5 doesn’t have an ansistring type to handle conversions automatically. But, for the good part, at least in both cases strings are sent by reference, so it is actually a problem of meaning instead of a conflict in value/reference sending.
This means you have to send UNICODE strings that are to be correctly decoded as ANSI, and receive in ANSI strings that you have to convert to UNICODE for using.
When you are to receive an ANSI string in a UNICODE form, start reading UNICODE characters by typecasting each character to a unsigned short, then divide this in the two ANSI, (by modulo 256), add to resulting UNICODE string the modulo (as ANSI code), and the remainder (as ANSI code). So each 2 bytes of the original ANSI map into 4 bytes (2 UNICODE characters).
When you want to pack an ANSI-encoded UNICODE string, like an MQL5 string, as an ANSI, you read every two UNICODE chars in a row, then forcibly typecast them to unsigned char, like the size of ANSI characters. Then pack up new UNICODE character with the first read as modulo and second as remainder into a larger unsigned short, that you will add as code of the new character to the resulting UNICODE string.
The following is the code of two conversion functions, written as a include file. Make sure you make this file to be a include file, in the include folder, saving it as stringlib.mqh.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | //+------------------------------------------------------------------+ //| stringlib.mqh | //| Copyright Bogdan Caramalac | //| http://mqlmagazine.com | //+------------------------------------------------------------------+ #property library #property copyright "Bogdan Caramalac" #property link "http://mqlmagazine.com" #property version "1.00" string ANSI2UNICODE(string s) { ushort mychar; long m,d; double mm,dd; string img; string res=""; if (StringLen(s)>0) { string g=" "; for (int i=0;i<StringLen(s);i++) { string f=" "; mychar=ushort(StringGetCharacter(s,i)); mm=MathMod(mychar,256); img=DoubleToString(mm,0); m=StringToInteger(img); dd=(mychar-m)/256; img=DoubleToString(dd,0); d=StringToInteger(img); if (m!=0) { StringSetCharacter(f,0,ushort(m)); StringSetCharacter(f,1,ushort(d)); StringConcatenate(res,res,f); }//if (m!=0) else break; }//for (int i=0;i<StringLen(s);i++) }//if (StringLen(s)>0) return(res); } string UNICODE2ANSI(string s) { int leng,ipos; uchar m,d; ulong big; leng=StringLen(s); string unichar; string res=""; if (leng!=0) { unichar=" "; ipos=0; while (ipos<leng) { //uchar typecasted because each double byte char is actually one byte m=uchar(StringGetCharacter(s,ipos)); if (ipos+1<leng) d=uchar(StringGetCharacter(s,ipos+1)); else d=0; big=d*256+m; StringSetCharacter(unichar,0,ushort(big)); StringConcatenate(res,res,unichar); ipos=ipos+2; } } return(res); } |
When using the include you simply write
1 | #include <stringlib.mqh> |
as in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | //+------------------------------------------------------------------+ //| teststrings.mq5 | //| Copyright Bogdan Caramalac | //| http://mqlmagazine.com | //+------------------------------------------------------------------+ #property copyright "Bogdan Caramalac" #property link "http://mqlmagazine.com" #property version "1.00" #include <stringlib.mqh> //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ void OnStart() { string original_unicode,ansi,converted_unicode; original_unicode="EvenString"; ansi=UNICODE2ANSI(original_unicode); converted_unicode=ANSI2UNICODE(ansi); Print(original_unicode," -> ",ansi," -> ",converted_unicode); original_unicode="OddString"; ansi=UNICODE2ANSI(original_unicode); converted_unicode=ANSI2UNICODE(ansi); Print(original_unicode," -> ",ansi," -> ",converted_unicode); } //+------------------------------------------------------------------+ |
Thanks a lot for this.
Great help for string communication between MQL and DLL’s
Now I know how to bite the chinese DLL string , thanks