[MacRuby] #1077: Performance of String
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- The performance of the following script improved dramatically with r4964, but the changes made on 12/17 seem to affect the performance again. '''Script''' {{{ freq = Hash.new(0) 10.times{File.read("test.txt").scan(/\w+/){|word| freq[word] += 1}} }}} '''File''': 1553 English words; times are all in seconds MacRuby 0.9 nightly 2010/12/17 {{{ 0.54 0.52 0.52 }}} MacRuby 0.9 nightly 2010/12/24 {{{ 34.42 34.55 34.50 }}} Ruby 1.8.7 {{{ 0.28 0.26 0.26 }}} I ran the script 4 times and the first one was dropped to eliminate the effect of MacRuby start-up time. My Mac is Mac mini C2D 2.0GHz. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Changes (by lsansonetti@…): * milestone: => MacRuby 1.0 Comment: Vincent changed the string system recently, it's likely due to this. We need to address this issue for the upcoming release. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:1> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I also added test results with 2010/12/24 nightly to #1019. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:2> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): In my environment, trunk is a bit slower than before, but not as significantly. Before: {{{ $ time macruby t.rb real 0m1.433s user 0m1.694s sys 0m0.085s }}} trunk: {{{ $ time ./miniruby t.rb real 0m1.963s user 0m2.371s sys 0m0.093s }}} So, I suspect Vincent committed improvements in the meantime you last checked. Let's improve more. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:3> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I tested this with 2011/01/04 nightly and got the same result. And I found an error in the original post. The file I used had 8092 words. As in the original I run the script 10 times on each file. MacRuby 2011/01/04 nightly {{{ 1548 words - 1.50 1.48 1.53 8092 words - 34.87 34.89 34.84 }}} MacRuby 2010/12/17 nightly {{{ 1548 words - 0.087 0.093 0.092 8092 words - 0.57 0.56 0.54 }}} Do I have to wait one more day for the changes to be reflected on the nightly? -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:4> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): Normally you should have the changes, so I guess my test is not the same as yours. Anyways, I committed an optimization as r5114, can you re-test with the next build? Also, can you attach the words files somewhere? (I use /usr/share/dict/words but apparently it doesn't give the same runtime numbers are yours). Thanks! -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:5> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): After investigating with Watson, we realized that this test will behave slower than 0.8 on UTF-8 files containing non-ASCII characters. Indeed, if the file contains unicode (multibyte) characters, MacRuby cannot identify string lengths and boundaries in a constant time fashion, which results in performance loss. 0.8 used to automatically convert these strings as UTF-16 internally for better performance, but in trunk, we removed the unicode datastore (for many other reasons, including multi-threading problems). We are discussing about adding an optimization in trunk, basically caching the boundaries. In the meantime, try forcing your file data object as UTF-16, using #encode, and see if you get the same performance numbers as 0.8. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:6> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): As suggested, I added .force_encoding("UTF-16BE") and run the test with 2011/01/05 nightly. {{{ freq = Hash.new(0) 10.times{File.read("test.txt").force_encoding("UTF- 16BE").scan(/\w+/){|word| freq[word] += 1}} }}} '''Results''' {{{ MacRuby 0.9 2011/01/05 1548 words - 0.082 0.092 0.081 8092 words - 0.43 0.44 0.47 }}} Now it is slightly faster than 0.8 (or 0.9 2010/12/17 nightly), though it's still twice as slow as 1.8.7. I also tested this with NSString#stringWithContentsOfFile:encoding:error: {{{ freq = Hash.new(0) 10.times{NSString.stringWithContentsOfFile("test.txt",encoding: NSUTF8StringEncoding, error: nil).scan(/\w+/){|word| freq[word] += 1}} }}} '''Results''' {{{ MacRuby 0.9 2011/01/05 1548 words - 1.39 1.38 1.42 8092 words - 34.81 34.47 34.54 }}} By applying force_encoding, this process also took less time. {{{ freq = Hash.new(0) 10.times{NSString.stringWithContentsOfFile("test.txt",encoding: NSUTF8StringEncoding, error: nil).UTF8String.force_encoding("UTF- 16BE").scan(/\w+/){|word| freq[word] += 1}} }}} '''Results''' {{{ MacRuby 0.9 2011/01/05 1548 words - 0.096 0.084 0.081 8092 words - 0.42 0.42 0.45 }}} This workaround works, but it would be desirable not to have to add extra methods. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:7> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I forgot to add the results of another test. Here, I saved the same file as ASCII (lossy conversion) and then converted it to UTF-8 (this means the encoding is UTF-8 but only with ASCII characters). {{{ MacRuby 0.9 2011/01/05 - UTF-8 encoded but only with ASCII characters 8092 words - 0.58 0.63 0.56 }}} This is slightly slower than the one with force_encode, but much faster than the original test (a UTF-8 encoded file with multi-byte characters). -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:8> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): We will make sure accessing multibyte UTF-8 strings will be faster. Thank you for verifying this, at least we now know the bottleneck. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:9> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): Also, I think that Ruby 1.9 may have the same problem as us. Ruby 1.8.7 doesn't have encoding support, but in 1.9, I believe UTF-8 will also be used by default here. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:10> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I installed Ruby 1.9.2 via MacPorts on another machine and run the script. {{{ freq = Hash.new(0) 10.times{File.read("test.txt").scan(/\w+/){|word| freq[word] += 1}} }}} The file contains 8092 words encoded in UTF-8 with multi-byte characters (in sec.). This machine is slightly faster than the one I used for previous tests. {{{ MacRuby 0.9 2011/01/05 27.26 28.55 27.26 CRuby 1.8.7 0.21 0.21 0.21 CRuby 1.9.2 (MacPorts) 0.11 0.11 0.11 }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:11> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): Interesting, so CRuby 1.9.2 has an optimization for this too. It would be interesting to look at what they did. In the meantime, Vincent committed an optimization as r5130. In my environment, with a UTF-8 test file that contains multibyte characters, trunk now runs significantly (20x) faster than before. Please try the upcoming nightly build and let us know how it works for you :) Note that only #scan has been optimized so far. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:12> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I ran the original test (the one without force_encoding) on 2011/01/08 nightly. Now it is faster than 2010/12/17 nightly. {{{ MacRuby 0.9 nightly 2011/01/08 1548 words - 0.069 0.065 0.068 (File#read) 8092 words - 0.34 0.34 0.35 (File#read) 8092 words - 0.27 0.27 0.27 (File#read w/ force_encoding) 8092 words - 0.38 0.38 0.39 (NSString) 8092 words - 0.28 0.28 0.29 (NSString w/ force_encoding) }}} It is still slower than CRuby 1.8.7 or 1.9.2, but this is a great improvement. Thanks for the hard work! -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:13> MacRuby <http://macruby.org/>
#1077: Performance of String ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: blocker | Milestone: MacRuby 0.9 Component: MacRuby | Resolution: fixed Keywords: | ----------------------------------+----------------------------------------- Changes (by lsansonetti@…): * status: new => closed * resolution: => fixed * milestone: MacRuby 1.0 => MacRuby 0.9 Comment: I think that trunk should now be as fast, if not faster, than MacRuby 0.8, and that performance regressions have been fixed. Therefore, I'm closing this bug for the 0.9 milestone. We do not intend to improve performance over CRuby for 1.0, as we prefer to focus on stability and compatibility instead. We will look at performance very seriously after 1.0. -- Ticket URL: <http://www.macruby.org/trac/ticket/1077#comment:14> MacRuby <http://macruby.org/>
participants (1)
-
MacRuby