[MacRuby-devel] Strings, Encodings and IO

Manfred Stienstra manfred at gmail.com
Wed Apr 8 00:25:15 PDT 2009


On Apr 8, 2009, at 7:23 AM, Benjamin Stiglitz wrote:

> When doing force_encoding, convert to a ByteString in the old  
> encoding, then try to convert to an NSString in the new encoding. If  
> we succeed, great. If not, leave as a tagged ByteString (and  
> probably whine about it).

That's actually wrong. All force_encoding does is change the encoding  
attribute of the string, it shouldn't change the internal encoding of  
the bytes. The encoding attribute is basically a switch to describe  
which set of string methods should be used on the bytes.

For more information see: http://blog.grayproductions.net/articles/ruby_19s_string

An example:

  We're in the same hypothetical HTTP library as before, and this  
library author has decided to
  _always_ force encoding to Shift JIS because he hates humanity:

   response = HTTP.get('http://example.com')
   response.body.encoding #=> Encoding::Shift_JIS

  If MacRuby internally forces the body encoding to Shift JIS  
information might get lost. So when
  someone decides to make it right afterwards:

   encoding = response.header['Content- 
type'].split(';').last.split('=').last
   encoding #=> 'utf-8'

  They might get into trouble here:

   response.body.force_encoding(Encoding::UTF_8)

  Cuz'

   Encoding.compatible?(Encoding::Shift_JIS, Encoding::UTF_8) #=> nil

I think the best course of action is to expand String specs in  
RubySpec for 1.9, after that anyone can freely hack away at a most  
optimal solution without fear of incompatibility. Reading those specs  
is also likely to give an idea for the most elegant solution.

Manfred


More information about the MacRuby-devel mailing list