That's actually wrong. All force_encoding does is change the encoding attribute of the string, it shouldn't change the internal encoding of the bytes. The encoding attribute is basically a switch to describe which set of string methods should be used on the bytes.
That's what force_encoding does in Ruby 1.9, but it's not possible to do the same if we want to use as NSStrings as much as possible.
response = HTTP.get('http://example.com') response.body.encoding #=> Encoding::Shift_JIS (...) response.body.force_encoding(Encoding::UTF_8)
If MacRuby internally forces the body encoding to Shift JIS information might get lost.
No it would not. If it was valid Shift_JIS, the conversion back from UTF-16 to Shift_JIS should get the original data back (well as long as the encoding conversion tables do correct round-tripping). And if the string was not valid Shift_JIS, we keep it as bytes so nothing is lost.
I think the best course of action is to expand String specs in RubySpec for 1.9, after that anyone can freely hack away at a most optimal solution without fear of incompatibility. Reading those specs is also likely to give an idea for the most elegant solution.
I think everyone agrees that having a Ruby 1.9 String specs will be necessity. And we'll also need to decide what parts of it to follow and what parts we do not need to. For example handling access to characters in a string with a partly invalid encoding exactly the same way as 1.9 seems hard to do:
s # a string in UTF-8 with a broken first byte => "\x00\x81\x93んにちは\n" s.length => 8 [s[0], s[1], s[2], s[3], s[4], s[5]] => ["\x00", "\x81", "\x93", "ん", "に", "ち"]
Handling everything as bytes when the encoding is invalid would be easy, but handling only the bad part as such seems hard if you do not want to have to write code for each encoding. And the UTF-16 support should also be made better in MacRuby than in Ruby 1.9.