#906: Double BOM force_encoding bug (from HAML) ---------------------------------+------------------------------------------ Reporter: timmfin@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ I was messing with some thoughts I have for a desktop mac app recently, and I wanted to embed HAML/SASS (http://github.com/nex3/haml) inside of it. My first thought was to get HAML running under macruby, since then I would be able to cleanly link from objective-c to macruby (rather than open c-ruby via an external process). Note, I'm only partially familiar with ruby and totally new to macruby. So there is a large chance I'm being an idiot in some way. So far I have had very little success getting HAML to run in macruby. I first tried macruby .6 but ran into an error. Then I tried building the latest .7 head to make sure that it still had the same problem. Here's the error. {{{ /Users/timmfin/Development/haml/lib/haml/util.rb:561:in `block': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError) from /Users/timmfin/Development/haml/lib/haml/util.rb:517:in `check_sass_encoding:' from /Users/timmfin/Development/haml/lib/sass/engine.rb:222:in `check_encoding!' from /Users/timmfin/Development/haml/lib/sass/engine.rb:202:in `_to_tree' from /Users/timmfin/Development/haml/lib/sass/engine.rb:167:in `to_css' }}} The line numbers don't match up, but here's the relevant code http://github.com/nex3/haml/blob/master/lib/haml/util.rb#L579 . Oh fun. Encodings. This code is building up a map of regular expressions, which will be used to figure out the encoding of incoming input text. AKA, match '\uFEFF@charset ".*"' or '\uFEFF' in various encodings. Macruby dies the first time it hits line 596 (when h = {} and e = "UTF-8"): {{{ Regexp.new(/\A(?:#{_enc("\uFEFF", e)})?#{ _enc('@charset "', e)}(.*?)#{_enc('"', e)}|\A(#{ _enc("\uFEFF", e)})/) }}} After taking out parts of the regex that don't matter and inlining _enc you get: {{{ Regexp.new(/#{ "\uFEFF".encode("UTF-8").force_encoding("BINARY") } #{ '@charset "'.encode("UTF-8").force_encoding("BINARY") } #{ '"'.encode("UTF-8").force_encoding("BINARY") } #{ "\uFEFF".encode("UTF-8").force_encoding("BINARY") }/) }}} When I run the above code against c-ruby 1.9 it works fine. But it dies (with the same incompatible character encodings error) against macruby .7 dev (from a few days ago). Here's the minimal test case I've created (also at http://gist.github.com/581906). {{{ #!/usr/local/bin/macruby s = "#{ "\uFEFF".encode("UTF-8").force_encoding("BINARY") }" puts "A single BOM Worked!" s = "first: #{ "\uFEFF".encode("UTF-8").force_encoding("BINARY") } second: #{ "\uFEFF".encode("UTF-8").force_encoding("BINARY") }" puts "Two BOMs Worked!" }}} The output is: {{{ timmfin-pro in ~/pending $ ./regex-encoding-test.rb A single BOM Worked! incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError) }}} The first string interpolation works fine, but the second one kills macruby. Looks like something awkward is going on in macruby when you try to substitute/concatenate two force_encoded BOMs into a string. Thanks for listening, I hope that I've uncovered a bug and not just some wacko code that lives in HAML. -- Ticket URL: <http://www.macruby.org/trac/ticket/906> MacRuby <http://macruby.org/>