[MacRuby-devel] [MacRuby] #339: YAML error with UTF-16 string

B. Ohr jazzbox at 7zz.de
Sun Nov 15 01:15:42 PST 2009


Am 14.11.2009 um 21:17 schrieb Matthias Neeracher:

>
> On Nov 14, 2009, at 15:44 , MacRuby wrote:
>
>> #339: YAML error with UTF-16 string
>> --------------------------- 
>> +------------------------------------------------
>> Reporter:  dev@…          |        Owner:  lsansonetti@…
>>     Type:  defect         |       Status:  closed
>> Priority:  critical       |    Milestone:  MacRuby 0.5
>> Component:  MacRuby        |   Resolution:  fixed
>> Keywords:  YAML encoding  |
>> --------------------------- 
>> +------------------------------------------------
>>
>> Comment(by jazzbox@…):
>>
>> {{{
>> $ macruby -e 'require "yaml"; puts "Rübe".to_yaml'
>> --- "R\xFCbe"
>> $ ruby1.9 -e 'require "yaml"; puts "Rübe".to_yaml'
>> --- "R\xC3\xBCbe"
>> }}}
>>
>> seems to work now! Macruby escpapes to UTF-16 and Ruby1.9 escapes to
>> UTF-8.
>
> Actually, it seems to me (though I'm willing to be corrected on  
> this), that the ruby1.9 encoding is simply wrong: It translates the  
> accented character into UTF-8, and then escapes the two UTF-8  
> characters separately. What this ends up encoding is "Rübe", which  
> is not what you want.
>
>> I didn't find anything in YAML docs that describes that behaviour,  
>> both methods seem to be correct.
>
> They can't possibly be BOTH correct, as interpreting the output of  
> one according to the theory of the other would give a different  
> result. If you look at the section in the YAML spec: <http://www.yaml.org/spec/1.2/spec.html#id2776092 
> >, you will see
>
> 	[57] "Escaped 8-bit Unicode character."
>
> This is NOT an UTF-8 character.
>
>> But ruby 1.8 fails to load the UTF-16 YAML. That is not astonishing  
>> because IMHO there is now way to guess what is the correct escaping  
>> mode.
>
> It's not astonishing because (a) 1.8 has very poor Unicode support  
> anyway and (b) this would hardly be the only bug in syck.
>

OK, you are right!

When I started generating a YAML in macruby and importing it to ruby  
1.8 I haven't done anything with Unicode, so I am not very experienced  
yet.


>> I think escaping is not necessary here because the encoding of  
>> input and
>> output is the same. This can easly be tested by
>>
>> {{{
>> $ macruby -e 'require "yaml"; puts YAML::load "--- Rübe"'
>> Rübe
>> }}}
>
> That's an interesting point. I think you're right that the YAML spec  
> does not require escaping of printable characters >\u007F. However,  
> non-printable characters DO have to be escaped, and for the  
> printable ones, it could be argued that erring on the side of  
> escaping helps readability if the OS does not have font coverage for  
> some printable characters. In any case, the current implementation  
> tries to be conservative in what it generates and liberal in what it  
> accepts. I'm open to persuasion that we should avoid escaping  
> characters, provided there is a low-cost test for printability of  
> general Unicode characters (I have not yet checked whether one of  
> the built-in CFCharacterSets can give that; the descriptions were  
> inconclusive).
>

The YAML spec, Chapter 5.1 Character Sets says:

 > "To ensure readability, YAML streams use only the printable subset  
of the Unicode character set"

 > [1]	c-printable	::=	  #x9 | #xA | #xD | [#x20-#x7E]          /* 8  
bit */
 > | #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD] /* 16 bit */
 > | [#x10000-#x10FFFF]                     /* 32 bit */

Only characters that are not "c-printable" MUST be escaped and this is  
well defined. (For Strings you have to add the " and the \ as special  
characters).

 > "...In addition, any allowed characters known to be non-printable  
SHOULD also be escaped.
 > This isn’t mandatory since a full implementation would require  
extensive character property tables."

So it is a SHOULD and not a MUST because it is too expensive. The YAML  
spec is a little bit confusing with "allowed characters" and "non  
printing characters".

Bernd





More information about the MacRuby-devel mailing list