[MacRuby] #906: Double BOM force_encoding bug (from HAML)

MacRuby ruby-noreply at macosforge.org
Wed Sep 15 20:26:11 PDT 2010


#906: Double BOM force_encoding bug (from HAML)
---------------------------------+------------------------------------------
 Reporter:  timmfin@…            |       Owner:  lsansonetti@…        
     Type:  defect               |      Status:  new                  
 Priority:  major                |   Milestone:                       
Component:  MacRuby              |    Keywords:                       
---------------------------------+------------------------------------------
 I was messing with some thoughts I have for a desktop mac app recently,
 and I wanted to embed HAML/SASS (http://github.com/nex3/haml) inside of
 it. My first thought was to get HAML running under macruby, since then I
 would be able to cleanly link from objective-c to macruby (rather than
 open c-ruby via an external process).

 Note, I'm only partially familiar with ruby and totally new to macruby. So
 there is a large chance I'm being an idiot in some way.

 So far I have had very little success getting HAML to run in macruby. I
 first tried macruby .6 but ran into an error. Then I tried building the
 latest .7 head to make sure that it still had the same problem. Here's the
 error.

 {{{
 /Users/timmfin/Development/haml/lib/haml/util.rb:561:in `block':
 incompatible character encodings: UTF-8 and ASCII-8BIT
 (Encoding::CompatibilityError)
         from /Users/timmfin/Development/haml/lib/haml/util.rb:517:in
 `check_sass_encoding:'
         from /Users/timmfin/Development/haml/lib/sass/engine.rb:222:in
 `check_encoding!'
         from /Users/timmfin/Development/haml/lib/sass/engine.rb:202:in
 `_to_tree'
         from /Users/timmfin/Development/haml/lib/sass/engine.rb:167:in
 `to_css'
 }}}

 The line numbers don't match up, but here's the relevant code
 http://github.com/nex3/haml/blob/master/lib/haml/util.rb#L579 . Oh fun.
 Encodings.

 This code is building up a map of regular expressions, which will be used
 to figure out the encoding of incoming input text. AKA, match
 '\uFEFF at charset ".*"' or '\uFEFF' in various encodings.

 Macruby dies the first time it hits line 596 (when h = {} and e =
 "UTF-8"):

 {{{
 Regexp.new(/\A(?:#{_enc("\uFEFF", e)})?#{
     _enc('@charset "', e)}(.*?)#{_enc('"', e)}|\A(#{
     _enc("\uFEFF", e)})/)
 }}}

 After taking out parts of the regex that don't matter and inlining _enc
 you get:

 {{{
 Regexp.new(/#{
         "\uFEFF".encode("UTF-8").force_encoding("BINARY")
     } #{
         '@charset "'.encode("UTF-8").force_encoding("BINARY")
     } #{
         '"'.encode("UTF-8").force_encoding("BINARY")
     } #{
         "\uFEFF".encode("UTF-8").force_encoding("BINARY")
     }/)
 }}}

 When I run the above code against c-ruby 1.9 it works fine. But it dies
 (with the same incompatible character encodings error) against macruby .7
 dev (from a few days ago).

 Here's the minimal test case I've created (also at
 http://gist.github.com/581906).

 {{{
 #!/usr/local/bin/macruby

 s = "#{
     "\uFEFF".encode("UTF-8").force_encoding("BINARY")
 }"
 puts "A single BOM Worked!"

 s = "first: #{
     "\uFEFF".encode("UTF-8").force_encoding("BINARY")
 } second: #{
     "\uFEFF".encode("UTF-8").force_encoding("BINARY")
 }"
 puts "Two BOMs Worked!"
 }}}

 The output is:

 {{{
 timmfin-pro in ~/pending
 $ ./regex-encoding-test.rb
 A single BOM Worked!
 incompatible character encodings: UTF-8 and ASCII-8BIT
 (Encoding::CompatibilityError)
 }}}

 The first string interpolation works fine, but the second one kills
 macruby. Looks like something awkward is going on in macruby when you try
 to substitute/concatenate two force_encoded BOMs into a string.

 Thanks for listening, I hope that I've uncovered a bug and not just some
 wacko code that lives in HAML.

-- 
Ticket URL: <http://www.macruby.org/trac/ticket/906>
MacRuby <http://macruby.org/>



More information about the macruby-tickets mailing list