[MacRuby-devel] Strings, Encodings and IO

Benjamin Stiglitz ben at tanjero.com
Wed Apr 8 11:40:13 PDT 2009


>> When doing force_encoding, convert to a ByteString in the old  
>> encoding, then try to convert to an NSString in the new encoding.  
>> If we succeed, great. If not, leave as a tagged ByteString (and  
>> probably whine about it).
>
> That's actually wrong. All force_encoding does is change the  
> encoding attribute of the string, it shouldn't change the internal  
> encoding of the bytes. The encoding attribute is basically a switch  
> to describe which set of string methods should be used on the bytes.

We have to go through this dance to get force_encoding to play nicely  
with NSString. Namely, NSString is always backed by an array of UTF-16  
code points. So, to reinterpret, we have to convert the internal rep  
to whatever the external encoding was, then back in, converting to  
UTF-16 from the new external encoding.

> We're in the same hypothetical HTTP library as before, and this  
> library author has decided to
> _always_ force encoding to Shift JIS because he hates humanity:
>
>  response = HTTP.get('http://example.com')
>  response.body.encoding #=> Encoding::Shift_JIS
>
> If MacRuby internally forces the body encoding to Shift JIS  
> information might get lost. So when
> someone decides to make it right afterwards:
>
>  encoding = response.header['Content- 
> type'].split(';').last.split('=').last
>  encoding #=> 'utf-8'
>
> They might get into trouble here:
>
>  response.body.force_encoding(Encoding::UTF_8)
>
> Cuz'
>
>  Encoding.compatible?(Encoding::Shift_JIS, Encoding::UTF_8) #=> nil

Vincent already answered this part; we’re still doing reinterpretation  
of what is essentially the original bytestream. Are there any  
encodings that map multiple sequences to the equivalent code point?  
(And I’m not talking about Unicode NFC/NFD/&c., that still makes it  
through the UTF-16 link alright.)

-Ben


More information about the MacRuby-devel mailing list