Help with cross-platform wchar_t to utf8 conversion

Mon Feb 2 22:34:43 PST 2009

David Bruce wrote:
> Hi,
> 
> I'm the upstream maintainer for Tux Math and Tux Typing.  I just
> contributed a current portfile for tuxmath, and would like to add
> tuxtype as well.  The tuxtype port is almost ready to go, but a
> significan part of the game doesn't work because the existing code to
> convert a wchar_t string to utf8 doesn't work as built by macports.
> 
> Tuxtype is a C program that uses SDL, without any Mac-specific
> libraries.  The existing conversion uses iconv, with a check to take
> into account Microsoft's two-byte wchar_t.  I had assumed the Mac
> would work just like Linux in this regard, but so far it hasn't.
> 
> The relevant OS-specific part works like this:
> 
>  //Microsoft uses a different wchar_t from the rest of the world - grrr...
> #ifdef WIN32
>  DEBUGCODE {fprintf(stderr, "WIN32, using UTF-16LE for wchar_t\n");}
>  conv_descr = iconv_open("UTF-8", "UTF-16LE");
> #else
>  conv_descr = iconv_open("UTF-8", "UTF-32");
> #endif
> 
>  bytes_converted = iconv(conv_descr,
>                          &wchar_t_Start, &in_length,
>                          &UTF8_Start, &out_length);
> 
> and so forth.
> 
> On Mac, the UTF-32 version gets selected, but it doesn't work.
> 
> So, as a quick fix:
> 1. is a Mac wchar_t string UTF-32BE, UTF-32LE, or something else?
> 2. what is the best preprocessor symbol to test for (i.e. __APPLE__,
> MACOSX, etc) to know if we are building for the Mac?  This same code
> may also be built with CMake.

It seems like having to know this stuff a priori would partly defeat the
purpose of having iconv? According to the iconv_open man page, you can
pass "wchar_t" as tocode or fromcode and it will be interpreted as
defined by the machine, OS and current locale.

- Josh