[MacRuby-devel] UTF8 Strings
Laurent Sansonetti
lsansonetti at apple.com
Sat Dec 5 16:22:57 PST 2009
Hi Steve,
On Dec 5, 2009, at 1:45 PM, s.ross wrote:
> My code receives XML data from a Web Service API call that is in
> UTF8 encoding. This winds up in a string.
>
> return_data = NSURLConnection.sendSynchronousRequest(@request,
> returningResponse: response, error: error)
> str = NSString.alloc.initWithData(return_data, encoding:
> NSUTF8StringEncoding)
> puts "******* response encoding it #{str.encoding}"
>
> The result of the puts above is 'MACINTOSH'.
>
> I suspect the encoding of the string is not UTF-8, because when I
> try to parse the XML using REXML, I get:
>
> RegexpError: too short multibyte code
>
> This occurs way in REXML:
>
> /Library/Frameworks/MacRuby.framework/Versions/0.5/usr/lib/ruby/
> 1.9.0/rexml/text.rb:132:in `check:'
>
> In any case, my questions are:
>
> 1) If anyone has run across this what did you do?
I don't believe REXML works. In any case, I would recommend to not use
it. Since you're already using Cocoa, why not giving NSXMLDocument a
try?
> 2) Why might the encoding be MACINTOSH and not UTF-8, as specified
> in the initWithData method call?
#encoding returns the fastest encoding available for the receiver. You
may specify UTF-8 during the string creation, but if Cocoa can pick a
smaller encoding at runtime (like ASCII) it will.
This is different from the Ruby 1.9 semantics and we have a plan to
fix that in 0.6.
> 3) Suggestions?
See my comment in 1) :)
Laurent
More information about the MacRuby-devel
mailing list