[MacRuby-devel] [MacRuby] #339: YAML error with UTF-16 string

Matthias Neeracher neeracher at apple.com
Sat Nov 14 12:17:27 PST 2009


On Nov 14, 2009, at 15:44 , MacRuby wrote:

> #339: YAML error with UTF-16 string
> ---------------------------+------------------------------------------------
> Reporter:  dev@…          |        Owner:  lsansonetti@…        
>     Type:  defect         |       Status:  closed               
> Priority:  critical       |    Milestone:  MacRuby 0.5          
> Component:  MacRuby        |   Resolution:  fixed                
> Keywords:  YAML encoding  |  
> ---------------------------+------------------------------------------------
> 
> Comment(by jazzbox@…):
> 
> {{{
> $ macruby -e 'require "yaml"; puts "Rübe".to_yaml'
> --- "R\xFCbe"
> $ ruby1.9 -e 'require "yaml"; puts "Rübe".to_yaml'
> --- "R\xC3\xBCbe"
> }}}
> 
> seems to work now! Macruby escpapes to UTF-16 and Ruby1.9 escapes to
> UTF-8.

Actually, it seems to me (though I'm willing to be corrected on this), that the ruby1.9 encoding is simply wrong: It translates the accented character into UTF-8, and then escapes the two UTF-8 characters separately. What this ends up encoding is "Rübe", which is not what you want.

> I didn't find anything in YAML docs that describes that behaviour, both methods seem to be correct.

They can't possibly be BOTH correct, as interpreting the output of one according to the theory of the other would give a different result. If you look at the section in the YAML spec: <http://www.yaml.org/spec/1.2/spec.html#id2776092>, you will see 

	[57] "Escaped 8-bit Unicode character."

This is NOT an UTF-8 character.

> But ruby 1.8 fails to load the UTF-16 YAML. That is not astonishing because IMHO there is now way to guess what is the correct escaping mode.

It's not astonishing because (a) 1.8 has very poor Unicode support anyway and (b) this would hardly be the only bug in syck.

> I think escaping is not necessary here because the encoding of input and
> output is the same. This can easly be tested by
> 
> {{{
> $ macruby -e 'require "yaml"; puts YAML::load "--- Rübe"'
> Rübe
> }}}

That's an interesting point. I think you're right that the YAML spec does not require escaping of printable characters >\u007F. However, non-printable characters DO have to be escaped, and for the printable ones, it could be argued that erring on the side of escaping helps readability if the OS does not have font coverage for some printable characters. In any case, the current implementation tries to be conservative in what it generates and liberal in what it accepts. I'm open to persuasion that we should avoid escaping characters, provided there is a low-cost test for printability of general Unicode characters (I have not yet checked whether one of the built-in CFCharacterSets can give that; the descriptions were inconclusive).

Matthias
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/macruby-devel/attachments/20091114/6b48b58c/attachment.html>


More information about the MacRuby-devel mailing list