Hi Steve, On Dec 5, 2009, at 1:45 PM, s.ross wrote:
My code receives XML data from a Web Service API call that is in UTF8 encoding. This winds up in a string.
return_data = NSURLConnection.sendSynchronousRequest(@request, returningResponse: response, error: error) str = NSString.alloc.initWithData(return_data, encoding: NSUTF8StringEncoding) puts "******* response encoding it #{str.encoding}"
The result of the puts above is 'MACINTOSH'.
I suspect the encoding of the string is not UTF-8, because when I try to parse the XML using REXML, I get:
RegexpError: too short multibyte code
This occurs way in REXML:
/Library/Frameworks/MacRuby.framework/Versions/0.5/usr/lib/ruby/ 1.9.0/rexml/text.rb:132:in `check:'
In any case, my questions are:
1) If anyone has run across this what did you do?
I don't believe REXML works. In any case, I would recommend to not use it. Since you're already using Cocoa, why not giving NSXMLDocument a try?
2) Why might the encoding be MACINTOSH and not UTF-8, as specified in the initWithData method call?
#encoding returns the fastest encoding available for the receiver. You may specify UTF-8 during the string creation, but if Cocoa can pick a smaller encoding at runtime (like ASCII) it will. This is different from the Ruby 1.9 semantics and we have a plan to fix that in 0.6.
3) Suggestions?
See my comment in 1) :) Laurent