[MacRuby-devel] Strings, Encodings and IO

Manfred Stienstra manfred at gmail.com
Mon Apr 6 23:15:05 PDT 2009


On Apr 7, 2009, at 7:47 AM, Vincent Isambart wrote:

I have two small comments and a general statement about your essay;

> A few functions of 1.9 may also be disabled (like force_encoding). Of
> course it would be possible to add the full functionality of Ruby 1.9
> strings on ByteString but it wouldn't be worth it.

The force_encoding method will be absolutely _vital_ to working with  
encodings in Ruby. Most library authors don't know anything about  
character encoding and _will_ do the wrong things. And I'm not even  
talking about libraries written for 1.8 which are totally unaware of  
the String changes. For example, in a fictional HTTP library that  
totally doesn't exist today:

   response = HTTP.get('http://www.google.com')
   response.body.encoding #=> #<Encoding:US-ASCII>

Even though the headers clearly say: "Content-Type: text/html;  
charset=UTF-8". So we need force_encoding to fix these problems. Even  
the library author probably needs force_encoding method because  
somewhere deep down in the library there might be C / Obj-C code that  
returns a byte string to Ruby without specifying the encoding.

> Ruby 1.9 also has default code and default external encodings
> different depending on the environment, but I think always both of
> them set to UTF-8 would be the best. (we may even completely ignore
> the encoding pragmas in the code not to complicate the parser).

Also, a no-go. ERB uses this pragma to signal what the encoding of the  
template is, encoding will break when you ignore this.

Finally; I don't think it's a good idea to discuss this a great length  
without actual code but in order to write a compatible implementation  
most (if not all) of the String awkwardness will have to be implemented.

Manfred


More information about the MacRuby-devel mailing list