[MacRuby] #1019: String#gsub performance
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- String#gsub with block performance improved dramatically with r4964, but String#gsub with or without block is still about 4-5 times slower than that of Ruby 1.8.7. Test 1 - gsub {{{ 10.times do text = File.read("test.txt") text.gsub!(/\w+/,"test") end }}} Test 2 - gsub with block {{{ 10.times do text = File.read("test.txt") text.gsub!(/\w+/){|x| x.upcase} end }}} The original text is in English with 8154 words. Results - Test 1 (sec.) {{{ Ruby 1.8.7 0.126 0.144 0.136 MacRuby 0.8 nightly 0.760 0.830 0.855 }}} Results - Test 2 (sec.) {{{ Ruby 1.8.7 0.369 0.336 0.303 MacRuby 0.8 nightly 1.377 1.552 1.465 }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1019> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Changes (by pthomson@…): * milestone: => MacRuby 1.0 Comment: Not sure how much we're going to be able to improve this, since the bulk of the processing time seems to be spent in ICU's regex methods. Still, worth checking out for 1.0, as gsub is important. -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:1> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): With the latest nightly, String performance seems to be very slow, so I tested this again. MacRuby 0.9 2010/12/24 nightly Test 1 - about 120 seconds Test 2 - about 75 seconds Ruby 1.8.7 (essentially the same as the original test) Test 1 - about 0.1 seconds Test 2 - about 0.25 seconds -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:2> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): I think it's much slower because of the new String changes Vincent committed. We need to re-visit that and at least get the same numbers as before the changes. -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:3> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): I can't seem to reproduce that huge difference (0.1s -> 120/75s) here. {{{ $ time ruby -e 'txt=File.read("GPL"); 1000.times { txt.gsub!(/\w+/, "test") }' real 0m2.320s user 0m2.307s sys 0m0.007s $ time ./miniruby -e 'txt=File.read("GPL"); 1000.times { txt.gsub!(/\w+/, "test") }' real 0m6.319s user 0m6.780s sys 0m0.097s }}} Maybe the problem was fixed in r5081. -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:4> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I tested this with 2011/01/04 nightly. This time, I changed the length of the text. I used the same original text and deleted later paragraphs to shorten it. {{{ MacRuby 0.9 2011/01/04 nightly - 1548 words Test 1 - 1.32 seconds Test 2 - 2.69 seconds MacRuby 0.9 2011/01/04 nightly - 2705 words Test 1 - 3.88 seconds Test 2 - 7.99 seconds MacRuby 0.9 2011/01/04 nightly - 3970 words Test 1 - 8.23 seconds Test 2 - 16.61 seconds MacRuby 0.9 2011/01/04 nightly - 6247 words Test 1 - 20.22 seconds Test 2 - 40.57 seconds MacRuby 0.9 2011/01/04 nightly - 8092 words Test 1 - 34.16 seconds Test 2 - 68.39 seconds }}} This build is faster than the one I tested last time (2010/12/24), but still much slower than before. Just to campare, I run the same test with 2010/12/17 nightly. {{{ MacRuby 0.9 2010/12/17 nightly - 8092 words Test 1 - 0.69 seconds Test 2 - 1.23 seconds }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:5> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): With 2011/01/05, the test results are the same. {{{ MacRuby 0.9 2011/01/05 nightly - 8092 words Test 1 - 34.35 seconds Test 2 - 69.64 seconds }}} With force_encoding, this process took much less time than 0.8 and only slightly slower than 1.8.7. Test 1 - gsub {{{ 10.times{File.read("test.txt").force_encoding("UTF- 16BE").gsub!(/\w+/,"test".force_encoding("UTF-16BE"))} }}} Test 2 - gsub with block {{{ 10.times{File.read("test.txt").force_encoding("UTF-16BE").gsub!(/\w+/){|x| x.upcase}} }}} '''Results''' {{{ MacRuby 0.9 2011/01/05 nightly - 8092 words with force_encoding Test 1 - 0.16 seconds Test 2 - 0.40 seconds }}} The problem of this is that the script file should also be encoded in UTF- 16BE unless adding force_encode (or encode) to each string object created in the script. (or is there a better way?) -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:6> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): Oh, 0.8 is MacRuby 0.8 and 1.8.7 is CRuby 1.8.7. -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:7> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I also run this test with the same file encoded in UTF-8 but only with ASCII characters (lossy conversion). {{{ MacRuby 0.9 2011/01/05 nightly - 8092 words, UTF-8 encoded but only with ASCII characters Test 1 - 0.21 seconds Test 2 - 0.57 seconds }}} This is slightly slower than force_encode, but much faster than the file in UTF-8 with multi-byte characters. -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:8> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): Good to know this is also related to slow access of multibyte UTF8. Normally, the optimization we will implement for #1077 should also help here, killing 2 birds with one stone. -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:9> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by vincent.isambart@…): String#gsub should now be much faster starting from r5136. Could you give it a try? -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:10> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I ran the test with 2011/01/08 nightly. I used a different machine, so the numbers are slightly different from the previous posts. The test file is with 8092 words, UTF-8 encoded with multi-byte characters. Results - Test 1 (sec.) {{{ MacRuby 0.9 2011/01/08 0.10 0.10 0.10 MacRuby 0.9 2011/01/08 0.097 0.097 0.096 (w/ force_encoding) CRuby 1.8.7 0.078 0.078 0.078 CRuby 1.9.1 0.062 0.061 0.061 }}} Results - Test 2 (sec.) {{{ MacRuby 0.9 2011/01/08 0.40 0.40 0.40 MacRuby 0.9 2011/01/08 0.27 0.27 0.27 (w/ force_encoding) CRuby 1.8.7 0.19 0.19 0.19 CRuby 1.9.2 0.099 0.099 0.099 }}} As with String#scan, it is much faster than before, but still slower than CRuby. Yet, this is a great improvement!! -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:11> MacRuby <http://macruby.org/>
#1019: String#gsub performance ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: major | Milestone: MacRuby 0.9 Component: MacRuby | Resolution: fixed Keywords: | ----------------------------------+----------------------------------------- Changes (by lsansonetti@…): * status: new => closed * resolution: => fixed * milestone: MacRuby 1.0 => MacRuby 0.9 Comment: I think that trunk should now be as fast, if not faster, than MacRuby 0.8, and that performance regressions have been fixed. Therefore, I'm closing this bug for the 0.9 milestone. We do not intend to improve performance over CRuby for 1.0, as we prefer to focus on stability and compatibility instead. We will look at performance very seriously after 1.0. -- Ticket URL: <http://www.macruby.org/trac/ticket/1019#comment:12> MacRuby <http://macruby.org/>
participants (1)
-
MacRuby