[MacRuby-devel] UTF8 Strings

Laurent Sansonetti lsansonetti at apple.com
Sat Dec 5 16:22:57 PST 2009


Hi Steve,

On Dec 5, 2009, at 1:45 PM, s.ross wrote:

> My code receives XML data from a Web Service API call that is in  
> UTF8 encoding. This winds up in a string.
>
>     return_data = NSURLConnection.sendSynchronousRequest(@request,  
> returningResponse: response, error: error)
>     str = NSString.alloc.initWithData(return_data, encoding:  
> NSUTF8StringEncoding)
>     puts "******* response encoding it #{str.encoding}"
>
> The result of the puts above is 'MACINTOSH'.
>
> I suspect the encoding of the string is not UTF-8, because when I  
> try to parse the XML using REXML, I get:
>
> RegexpError: too short multibyte code
>
> This occurs way in REXML:
>
> /Library/Frameworks/MacRuby.framework/Versions/0.5/usr/lib/ruby/ 
> 1.9.0/rexml/text.rb:132:in `check:'
>
> In any case, my questions are:
>
> 1) If anyone has run across this what did you do?

I don't believe REXML works. In any case, I would recommend to not use  
it. Since you're already using Cocoa, why not giving NSXMLDocument a  
try?

> 2) Why might the encoding be MACINTOSH and not UTF-8, as specified  
> in the initWithData method call?

#encoding returns the fastest encoding available for the receiver. You  
may specify UTF-8 during the string creation, but if Cocoa can pick a  
smaller encoding at runtime (like ASCII) it will.

This is different from the Ruby 1.9 semantics and we have a plan to  
fix that in 0.6.

> 3) Suggestions?

See my comment in 1) :)

Laurent


More information about the MacRuby-devel mailing list