[MacRuby] #1012: Inconsistent Regex behaviour
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: new Priority: minor | Milestone: Component: MacRuby | Keywords: -------------------------------------------+-------------------------------- I've been trying to use Ruby's built in Net::IMAP but haven't got beyond the first hurdle. require 'net/imap' imap = Net::IMAP.new("imap.gmail.com", 25, false) // crashes with 'can't convert nil into String (TypeError)' I've traced it to an inconsistency in the Regex handling between vanilla Ruby and MacRuby (tested on the latest nightly build of 0.8 as of the 25th of November): $ macruby --version MacRuby 0.8 (ruby 1.9.2) [universal-darwin10.0, x86_64] $ macruby regex-test.rb 0 '220' versus $ ruby --version ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0] $ ruby regex-test.rb 0 '220' 3 ' ' 4 'mx.google.com' 17 ' ' 18 'ESMTP' 23 ' ' 24 'n40sm3793094weq.29' -- Ticket URL: <http://www.macruby.org/trac/ticket/1012> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: new Priority: minor | Milestone: Component: MacRuby | Keywords: -------------------------------------------+-------------------------------- Description changed by martinlagardette@…: Old description:
I've been trying to use Ruby's built in Net::IMAP but haven't got beyond the first hurdle.
require 'net/imap' imap = Net::IMAP.new("imap.gmail.com", 25, false) // crashes with 'can't convert nil into String (TypeError)'
I've traced it to an inconsistency in the Regex handling between vanilla Ruby and MacRuby (tested on the latest nightly build of 0.8 as of the 25th of November):
$ macruby --version MacRuby 0.8 (ruby 1.9.2) [universal-darwin10.0, x86_64] $ macruby regex-test.rb 0 '220'
versus
$ ruby --version ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0] $ ruby regex-test.rb 0 '220' 3 ' ' 4 'mx.google.com' 17 ' ' 18 'ESMTP' 23 ' ' 24 'n40sm3793094weq.29'
New description: I've been trying to use Ruby's built in Net::IMAP but haven't got beyond the first hurdle. {{{ #!ruby require 'net/imap' imap = Net::IMAP.new("imap.gmail.com", 25, false) # crashes with 'can't convert nil into String (TypeError)' }}} I've traced it to an inconsistency in the Regex handling between vanilla Ruby and MacRuby (tested on the latest nightly build of 0.8 as of the 25th of November): {{{ $ macruby --version MacRuby 0.8 (ruby 1.9.2) [universal-darwin10.0, x86_64] $ macruby regex-test.rb 0 '220' }}} versus {{{ $ ruby --version ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0] $ ruby regex-test.rb 0 '220' 3 ' ' 4 'mx.google.com' 17 ' ' 18 'ESMTP' 23 ' ' 24 'n40sm3793094weq.29' }}} -- -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:1> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: new Priority: minor | Milestone: Component: MacRuby | Keywords: -------------------------------------------+-------------------------------- Comment(by watson1978@…): It seems that handling of Regexp's ¥G is broken. {{{ #!ruby str = "123456789" str.scan(/\G\d\d\d/) {|m| p m } }}} {{{ $ ruby t.rb "123" "456" "789" $ macruby t.rb "123" }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:2> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: new Priority: minor | Milestone: Component: MacRuby | Keywords: -------------------------------------------+-------------------------------- Comment(by watson1978@…): It seems to have to use uregex_findNext() to support \G. {{{ #!diff diff --git a/re.c b/re.c index afbed36..49cbdb8 100644 --- a/re.c +++ b/re.c @@ -785,10 +785,13 @@ rb_reg_matcher_search(VALUE re, VALUE matcher, int pos, bool reverse) return -1; } } - else if (!uregex_find(re_matcher->pattern, pos, &status)) { - // No match. - rb_backref_set(Qnil); - return -1; + else { + uregex_setRegion(re_matcher->pattern, pos, chars_len, &status); + if (!uregex_findNext(re_matcher->pattern, &status)) { + // No match. + rb_backref_set(Qnil); + return -1; + } } // Match found. }}} Test Script: {{{ require 'tempfile' require 'test/unit/assertions.rb' include Test::Unit::Assertions str = "0123456789" assert_equal(["01", "23", "45", "67", "89"], str.scan(/\G\d\d/)) assert_equal("x23456789", str.sub(/\G\d\d/, "x")) assert_equal("xxxxx", str.gsub(/\G\d\d/, "x")) assert_equal(2, str.index(/\G\d\d/, 2)) # not yet #assert_equal(2, str.rindex(/\G\d\d/, 2)) puts :ok }}} I think that many methods can support \G by an above change. but does not supported yet with String#rindex and String#rpartition. -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:3> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: new Priority: minor | Milestone: Component: MacRuby | Keywords: -------------------------------------------+-------------------------------- Comment(by lsansonetti@…): Good job :) Maybe it doesn't work for #rindex since it applies the regex backward? (ICU does not support that, so we currently mimic it, as you can see in re.c). -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:4> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: new Priority: minor | Milestone: Component: MacRuby | Keywords: -------------------------------------------+-------------------------------- Comment(by watson1978@…):
Maybe it doesn't work for #rindex since it applies the regex backward? (ICU does not support that, so we currently mimic it, as you can see in re.c). I see. It seems to be necessary for me to learn more ICU X(
[[BR]] I modified the patch because I create the another bug.[[BR]] String#gsub, #scan and #split, those behavior has changed. Fortunately, for those methods, should use only uregex_findNext, not use uregex_setRegion. {{{ #!diff diff --git a/re.c b/re.c index afbed36..22514a2 100644 --- a/re.c +++ b/re.c @@ -747,8 +747,8 @@ rb_reg_matcher_destroy(VALUE matcher) xfree((void *)matcher); } -int -rb_reg_matcher_search(VALUE re, VALUE matcher, int pos, bool reverse) +static int +rb_reg_matcher_search_find(VALUE re, VALUE matcher, int pos, bool reverse, bool findFirst) { rb_regexp_matcher_t *re_matcher = (rb_regexp_matcher_t *)matcher; @@ -763,7 +763,7 @@ rb_reg_matcher_search(VALUE re, VALUE matcher, int pos, bool reverse) if (chars_len < 0) { chars_len = 0; } - + if (pos > chars_len || pos < 0) { rb_backref_set(Qnil); return -1; @@ -785,10 +785,15 @@ rb_reg_matcher_search(VALUE re, VALUE matcher, int pos, bool reverse) return -1; } } - else if (!uregex_find(re_matcher->pattern, pos, &status)) { - // No match. - rb_backref_set(Qnil); - return -1; + else { + if (findFirst) { + uregex_setRegion(re_matcher->pattern, pos, chars_len, &status); + } + if (!uregex_findNext(re_matcher->pattern, &status)) { + // No match. + rb_backref_set(Qnil); + return -1; + } } // Match found. @@ -839,6 +844,18 @@ rb_reg_matcher_search(VALUE re, VALUE matcher, int pos, bool reverse) return res[0].beg; } +int +rb_reg_matcher_search_first(VALUE re, VALUE matcher, int pos, bool reverse) +{ + return rb_reg_matcher_search_find(re, matcher, pos, reverse, true); +} + +int +rb_reg_matcher_search_next(VALUE re, VALUE matcher, int pos, bool reverse) +{ + return rb_reg_matcher_search_find(re, matcher, pos, reverse, false); +} + static long reg_match_pos(VALUE re, VALUE *strp, long pos) { diff --git a/re.h b/re.h index 595ee85..2006118 100644 --- a/re.h +++ b/re.h @@ -25,13 +25,15 @@ VALUE rb_regexp_source(VALUE re); VALUE rb_reg_matcher_new(VALUE re, VALUE str); void rb_reg_matcher_destroy(VALUE matcher); -int rb_reg_matcher_search(VALUE re, VALUE matcher, int pos, bool reverse); +int rb_reg_matcher_search_first(VALUE re, VALUE matcher, int pos, bool reverse); +int rb_reg_matcher_search_next(VALUE re, VALUE matcher, int pos, bool reverse); +#define rb_reg_matcher_search rb_reg_matcher_search_next static inline int rb_reg_search(VALUE re, VALUE str, int pos, bool reverse) { VALUE matcher = rb_reg_matcher_new(re, str); - const int res = rb_reg_matcher_search(re, matcher, pos, reverse); + const int res = rb_reg_matcher_search_first(re, matcher, pos, reverse); rb_reg_matcher_destroy(matcher); return res; } }}} Test Script: {{{ require 'test/unit/assertions.rb' include Test::Unit::Assertions str = "hello homely world. hah!" assert_equal("huh? homely world. hah!", str.gsub(/\Ah\S+\s*/, "huh? ")) assert_equal(["hello "], str.scan(/\Ah\S+\s*/)) assert_equal(["", "homely world. hah!"], str.split(/\Ah\S+\s*/)) assert_equal(" Text\n", "Text\n".gsub(/^/, ' ')) assert_equal(" Text\n Foo", "Text\nFoo".gsub(/^/, ' ')) str = "0123456789" assert_equal(["01", "23", "45", "67", "89"], str.scan(/\G\d\d/)) assert_equal("x23456789", str.sub(/\G\d\d/, "x")) assert_equal("xxxxx", str.gsub(/\G\d\d/, "x")) assert_equal(2, str.index(/\G\d\d/, 2)) # not yet #assert_equal(2, str.rindex(/\G\d\d/, 2)) puts :ok }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:5> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: minor | Milestone: Component: MacRuby | Resolution: fixed Keywords: | -------------------------------------------+-------------------------------- Changes (by watson1978@…): * status: new => closed * resolution: => fixed Comment: Fixed with r4956. -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:6> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: minor | Milestone: MacRuby 0.8 Component: MacRuby | Resolution: fixed Keywords: | -------------------------------------------+-------------------------------- Changes (by watson1978@…): * milestone: => MacRuby 0.8 -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:7> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: minor | Milestone: Component: MacRuby | Resolution: Keywords: | -------------------------------------------+-------------------------------- Changes (by lsansonetti@…): * status: closed => reopened * resolution: fixed => * milestone: MacRuby 0.8 => Comment: Thanks for the patch :) I confirm \G works better now, and the spec suite still passes for me, so no regression in sight. However, it looks like the problem reported above is still valid: {{{ $ /usr/local/bin/macruby t.rb /Users/lrz/src/macruby-trunk/t.rb:2:in `<main>': can't convert nil into String (TypeError) }}} I'm reopening the ticket. -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:8> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: minor | Milestone: Component: MacRuby | Resolution: Keywords: | -------------------------------------------+-------------------------------- Comment(by watson1978@…): hmm. With Trunk HEAD, when I executed an above test code, MacRuby displayed no message in my environment. {{{ #$ cat tg.rb require 'net/imap' Thread.start { imap = Net::IMAP.new('imap.gmail.com', 25, false) } sleep 30 puts :Timeout }}} {{{ $ macruby tg.rb Timeout $ ruby19 tg.rb Timeout }}} So I can't examine it what is the cause of this issue X( -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:9> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: minor | Milestone: Component: MacRuby | Resolution: Keywords: | -------------------------------------------+-------------------------------- Comment(by lsansonetti@…): I believe you don't see any message as the exception is being eaten by the thread exception handler. If you remove the Thread.start {} block and run the Net::IMAP.new call in the main thread, you should see the exception. -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:10> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: minor | Milestone: Component: MacRuby | Resolution: Keywords: | -------------------------------------------+-------------------------------- Comment(by watson1978@…): I execute an original test code of this issue and waited for several minutes, but displayed no message and did not finish.[[BR]] CRuby too. So I added a Timeout. MacRuby display an error message when I changed an argument as follows.[[BR]] {{{ #cat tg.rb require 'net/imap' imap = Net::IMAP.new('imap.gmail.com', 993, true) }}} {{{ $ macruby tg.rb /Users/watson/tmp/tg.rb:2:in `<main>': undefined method `upcase' for nil:NilClass (NoMethodError) }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:11> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: minor | Milestone: Component: MacRuby | Resolution: Keywords: | -------------------------------------------+-------------------------------- Comment(by lsansonetti@…): As of trunk: {{{ $ DYLD_LIBRARY_PATH=. ./macruby -I./lib -r net/imap -e "Net::IMAP.new('imap.gmail.com', 993, true)" -e:1:in `<main>': undefined method `upcase' for nil:NilClass (NoMethodError) $ DYLD_LIBRARY_PATH=. ./macruby -I./lib -r net/imap -e "Net::IMAP.new('imap.gmail.com', 25, true)" -e:1:in `<main>': SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: unknown protocol (OpenSSL::SSL::SSLError) $ DYLD_LIBRARY_PATH=. ./macruby -I./lib -r net/imap -e "Net::IMAP.new('imap.gmail.com', 25, false)" -e:1:in `<main>': can't convert nil into String (TypeError) }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:12> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: minor | Milestone: Component: MacRuby | Resolution: Keywords: | -------------------------------------------+-------------------------------- Comment(by lsansonetti@…): I found out the problem in "lib/net/imap.rb" line 3113, here is a reduction: {{{ $ cat t2.rb str = "220 mx.google.com ESMTP v56sm7014509eeh.14\r\n" re = /\G(?:(?# 1: SPACE )( +)|(?# 2: NIL )(NIL)(?=[\x80-\xff(){ \x00-\x1f\x7f%*"\\\[\]+])|(?# 3: NUMBER )(\d+)(?=[\x80-\xff(){ \x00-\x1f\x7f%*"\\\[\]+])|(?# 4: ATOM )([^\x80-\xff(){ \x00-\x1f\x7f%*"\\\[\]+]+)|(?# 5: QUOTED )"((?:[^\x00\r\n"\\]|\\["\\])*)"|(?# 6: LPAR )(\()|(?# 7: RPAR )(\))|(?# 8: BSLASH )(\\)|(?# 9: STAR )(\*)|(?# 10: LBRA )(\[)|(?# 11: RBRA )(\])|(?# 12: LITERAL )\{(\d+)\}\r\n|(?# 13: PLUS )(\+)|(?# 14: PERCENT )(%)|(?# 15: CRLF )(\r\n)|(?# 16: EOF )(\z))/i pos = 0 str.index(re, pos) p $+ $ ruby t2.rb "220" $ ./miniruby t2.rb nil }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:13> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: minor | Milestone: Component: MacRuby | Resolution: Keywords: | -------------------------------------------+-------------------------------- Comment(by watson1978@…): I attach a patch: {{{ #!diff diff --git a/re.c b/re.c index 22514a2..3ff54c7 100644 --- a/re.c +++ b/re.c @@ -1829,7 +1829,18 @@ rb_reg_match_last(VALUE rcv) return Qnil; } assert(RMATCH(rcv)->results_count > 0); - return rb_reg_nth_match(RMATCH(rcv)->results_count - 1, rcv); + + int nth = RMATCH(rcv)->results_count - 1; + while(nth > 0) { + if (RMATCH(rcv)->results[nth].beg != -1) { + break; + } + nth--; + } + if (nth == 0) { + return Qnil; + } + return rb_reg_nth_match(nth, rcv); } /* }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:14> MacRuby <http://macruby.org/>
#1012: Inconsistent Regex behaviour -------------------------------------------+-------------------------------- Reporter: harry@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: minor | Milestone: MacRuby 0.8 Component: MacRuby | Resolution: fixed Keywords: | -------------------------------------------+-------------------------------- Changes (by lsansonetti@…): * status: reopened => closed * resolution: => fixed * milestone: => MacRuby 0.8 Comment: Wow, that was a quick fix :) I confirm that the commands pasted above now run without exception as of r4962 (your latest commit) and the spec suite still runs. So, let's close the bug. Thanks for contributing these fixes :) -- Ticket URL: <http://www.macruby.org/trac/ticket/1012#comment:15> MacRuby <http://macruby.org/>
participants (1)
-
MacRuby