[MacRuby] #906: Double BOM force_encoding bug (from HAML)
MacRuby
ruby-noreply at macosforge.org
Wed Sep 15 20:26:11 PDT 2010
#906: Double BOM force_encoding bug (from HAML)
---------------------------------+------------------------------------------
Reporter: timmfin@… | Owner: lsansonetti@…
Type: defect | Status: new
Priority: major | Milestone:
Component: MacRuby | Keywords:
---------------------------------+------------------------------------------
I was messing with some thoughts I have for a desktop mac app recently,
and I wanted to embed HAML/SASS (http://github.com/nex3/haml) inside of
it. My first thought was to get HAML running under macruby, since then I
would be able to cleanly link from objective-c to macruby (rather than
open c-ruby via an external process).
Note, I'm only partially familiar with ruby and totally new to macruby. So
there is a large chance I'm being an idiot in some way.
So far I have had very little success getting HAML to run in macruby. I
first tried macruby .6 but ran into an error. Then I tried building the
latest .7 head to make sure that it still had the same problem. Here's the
error.
{{{
/Users/timmfin/Development/haml/lib/haml/util.rb:561:in `block':
incompatible character encodings: UTF-8 and ASCII-8BIT
(Encoding::CompatibilityError)
from /Users/timmfin/Development/haml/lib/haml/util.rb:517:in
`check_sass_encoding:'
from /Users/timmfin/Development/haml/lib/sass/engine.rb:222:in
`check_encoding!'
from /Users/timmfin/Development/haml/lib/sass/engine.rb:202:in
`_to_tree'
from /Users/timmfin/Development/haml/lib/sass/engine.rb:167:in
`to_css'
}}}
The line numbers don't match up, but here's the relevant code
http://github.com/nex3/haml/blob/master/lib/haml/util.rb#L579 . Oh fun.
Encodings.
This code is building up a map of regular expressions, which will be used
to figure out the encoding of incoming input text. AKA, match
'\uFEFF at charset ".*"' or '\uFEFF' in various encodings.
Macruby dies the first time it hits line 596 (when h = {} and e =
"UTF-8"):
{{{
Regexp.new(/\A(?:#{_enc("\uFEFF", e)})?#{
_enc('@charset "', e)}(.*?)#{_enc('"', e)}|\A(#{
_enc("\uFEFF", e)})/)
}}}
After taking out parts of the regex that don't matter and inlining _enc
you get:
{{{
Regexp.new(/#{
"\uFEFF".encode("UTF-8").force_encoding("BINARY")
} #{
'@charset "'.encode("UTF-8").force_encoding("BINARY")
} #{
'"'.encode("UTF-8").force_encoding("BINARY")
} #{
"\uFEFF".encode("UTF-8").force_encoding("BINARY")
}/)
}}}
When I run the above code against c-ruby 1.9 it works fine. But it dies
(with the same incompatible character encodings error) against macruby .7
dev (from a few days ago).
Here's the minimal test case I've created (also at
http://gist.github.com/581906).
{{{
#!/usr/local/bin/macruby
s = "#{
"\uFEFF".encode("UTF-8").force_encoding("BINARY")
}"
puts "A single BOM Worked!"
s = "first: #{
"\uFEFF".encode("UTF-8").force_encoding("BINARY")
} second: #{
"\uFEFF".encode("UTF-8").force_encoding("BINARY")
}"
puts "Two BOMs Worked!"
}}}
The output is:
{{{
timmfin-pro in ~/pending
$ ./regex-encoding-test.rb
A single BOM Worked!
incompatible character encodings: UTF-8 and ASCII-8BIT
(Encoding::CompatibilityError)
}}}
The first string interpolation works fine, but the second one kills
macruby. Looks like something awkward is going on in macruby when you try
to substitute/concatenate two force_encoded BOMs into a string.
Thanks for listening, I hope that I've uncovered a bug and not just some
wacko code that lives in HAML.
--
Ticket URL: <http://www.macruby.org/trac/ticket/906>
MacRuby <http://macruby.org/>
More information about the macruby-tickets
mailing list