[MacRuby-devel] UTF8 Strings

s.ross cwdinfo at gmail.com
Sat Dec 5 17:10:23 PST 2009


Laurent--

Thanks for the quick reply. See comments below:


On Dec 5, 2009, at 4:22 PM, Laurent Sansonetti wrote:

> Hi Steve,
> 
> On Dec 5, 2009, at 1:45 PM, s.ross wrote:
> 
>> My code receives XML data from a Web Service API call that is in UTF8 encoding. This winds up in a string.
>> 
>>    return_data = NSURLConnection.sendSynchronousRequest(@request, returningResponse: response, error: error)
>>    str = NSString.alloc.initWithData(return_data, encoding: NSUTF8StringEncoding)
>>    puts "******* response encoding it #{str.encoding}"
>> 
>> The result of the puts above is 'MACINTOSH'.
>> 
>> I suspect the encoding of the string is not UTF-8, because when I try to parse the XML using REXML, I get:
>> 
>> RegexpError: too short multibyte code
>> 
>> This occurs way in REXML:
>> 
>> /Library/Frameworks/MacRuby.framework/Versions/0.5/usr/lib/ruby/1.9.0/rexml/text.rb:132:in `check:'
>> 
>> In any case, my questions are:
>> 
>> 1) If anyone has run across this what did you do?
> 
> I don't believe REXML works. In any case, I would recommend to not use it. Since you're already using Cocoa, why not giving NSXMLDocument a try?

What I really want to use is Nokogiri. My main issue is that I'm having to reimplement XML-RPC because the Ruby Std. Lib version is broken over SSL. Even if it weren't it's never been thread safe and thus can't operate asynchronously. As a result, what I have is an XML document inside an XML-RPC response envelope. That means I have to parse the document once to get the contents of the envelope (which is HTML-escaped), then parse those contents to get an XML document I can work with. I've been using XPath for that, and that's why I haven't moved over the NSXMLDocument.

Maybe I'm missing a bet here and should shift my strategy. I'll do some more reading...

>> 2) Why might the encoding be MACINTOSH and not UTF-8, as specified in the initWithData method call?
> 
> #encoding returns the fastest encoding available for the receiver. You may specify UTF-8 during the string creation, but if Cocoa can pick a smaller encoding at runtime (like ASCII) it will.
> 
> This is different from the Ruby 1.9 semantics and we have a plan to fix that in 0.6.

This is kind of surprising behavior. The 1.9 semantics are sufficiently different from 1.8x that code that works correctly on 1.8.7 breaks awkwardly on 1.9. Ok, but I fixed that in an MRI version and the gotcha above broke my MacRuby version. Now that I know this, I guess I can deal with it.

> 
>> 3) Suggestions?
> 
> See my comment in 1) :)
> 
> Laurent
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel



More information about the MacRuby-devel mailing list