Re: [MacRuby-devel] Strings, Encodings and IO

8 Apr 2009

      ...
That's actually wrong. All force_encoding does is change the encoding
attribute of the string, it shouldn't change the internal encoding of the
bytes. The encoding attribute is basically a switch to describe which set of
string methods should be used on the bytes.
That's what force_encoding does in Ruby 1.9, but it's not possible to
do the same if we want to use as NSStrings as much as possible.
...
response = HTTP.get('http://example.com')
 response.body.encoding #=> Encoding::Shift_JIS
(...)
 response.body.force_encoding(Encoding::UTF_8)
If MacRuby internally forces the body encoding to Shift JIS information
might get lost.
No it would not. If it was valid Shift_JIS, the conversion back from
UTF-16 to Shift_JIS should get the original data back (well as long as
the encoding conversion tables do correct round-tripping).
And if the string was not valid Shift_JIS, we keep it as bytes so
nothing is lost.
...
I think the best course of action is to expand String specs in RubySpec for
1.9, after that anyone can freely hack away at a most optimal solution
without fear of incompatibility. Reading those specs is also likely to give
an idea for the most elegant solution.
I think everyone agrees that having a Ruby 1.9 String specs will be
necessity. And we'll also need to decide what parts of it to follow
and what parts we do not need to. For example handling access to
characters in a string with a partly invalid encoding exactly the same
way as 1.9 seems hard to do:
...
s # a string in UTF-8 with a broken first byte
=> "\x00\x81\x93んにちは\n"
s.length
=> 8
[s[0], s[1], s[2], s[3], s[4], s[5]]
=> ["\x00", "\x81", "\x93", "ん", "に", "ち"]
Handling everything as bytes when the encoding is invalid would be
easy, but handling only the bad part as such seems hard if you do not
want to have to write code for each encoding.

And the UTF-16 support should also be made better in MacRuby than in Ruby 1.9.

Re: [MacRuby-devel] Strings, Encodings and IO

Vincent Isambart