[MacRuby-devel] [MacRuby] #225: regexp engine broken when a string contains non ascii characters
MacRuby
ruby-noreply at macosforge.org
Sun Mar 1 01:33:36 PST 2009
#225: regexp engine broken when a string contains non ascii characters
-------------------------------------+--------------------------------------
Reporter: mattaimonetti@… | Owner: lsansonetti@…
Type: defect | Status: new
Priority: critical | Milestone: MacRuby 0.4
Component: MacRuby | Keywords: regexp, bug
-------------------------------------+--------------------------------------
Here is a sample code to reproduce the problem:
{{{
html = %{<p><a
href="http://www.flickr.com/people/jeanelietrujillo/">jeanelietrujillo</a>
posted a photo:</p>
<p><a href="http://www.flickr.com/photos/jeanelietrujillo/2211862262/"
title="Galgani Décoration"><img
src="http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg"
width="240" height="240" alt="Galgani Décoration" /></a></p>}
html.scan(/<img\s+src="(.+?)"/)[0][0]
}}}
ruby 1.9 returns:
{{{
=> "http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg"
}}}
macruby returns:
{{{
=> "ttp://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg\""
}}}
Now let's try to remove the é and replace it by a e:
{{{
html = %{<p><a
href="http://www.flickr.com/people/jeanelietrujillo/">jeanelietrujillo</a>
posted a photo:</p>
<p><a href="http://www.flickr.com/photos/jeanelietrujillo/2211862262/"
title="Galgani Decoration"><img
src="http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg"
width="240" height="240" alt="Galgani Décoration" /></a></p>}
html.scan(/<img\s+src="(.+?)"/)[0][0]
}}}
MacRuby now returns:
{{{
=> "http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg"
}}}
My guess is that the unicode characters mess up the the count to extract
the matched string resulting in a substring starting one character too
early.
To prove my hypothesis here is another sample, this time with 2 "é"
characters:
{{{
html = %{<p><a
href="http://www.flickr.com/people/jeanelietrujillo/">jeanelietrujillo</a>a
posté une photo:</p>
<p><a href="http://www.flickr.com/photos/jeanelietrujillo/2211862262/"
title="Galgani Décoration"><img
src="http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg"
width="240" height="240" alt="Galgani Décoration" /></a></p>}
html.scan(/<img\s+src="(.+?)"/)[0][0]
}}}
MacRuby returns:
{{{
=> "tp://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg\" "
}}}
--
Ticket URL: <http://www.macruby.org/trac/ticket/225>
MacRuby <http://macruby.org/>
More information about the MacRuby-devel
mailing list