[MacRuby] #1048: Performance of Hash with an Array as a key
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- I found another performance issue related to text processing using Hash. This little script is an attempt to count n-grams (n-words sequences) in text. The same script on Ruby 1.8.7 runs much faster and not affected by the number of array elements. '''Script''' {{{ n = 1 hash = Hash.new(0) words = File.open("test.txt").read.scan(/\w+/) (words.length - n).times do |i| hash[words[i..n+i]] += 1 end }}} I used a text file with about 8000 English words. I ran the test 3 times for each of 1 to 4 grams (1 to 4 array elements) to check that the results were consistent. Only the processing times of the block part are shown. [[BR]] '''Results''': MacRuby - hash with array as key (in sec.) {{{ word (n=0) 3.95 4.00 3.96 2-gram (n=1) 12.35 13.02 13.16 3-gram (n=2) 17.97 17.90 17.92 4-gram (n=3) 21.26 21.22 20.78 }}} '''Results''': Ruby 1.8.7 - hash with array as key (in sec.) {{{ word (n=0) 0.049 0.048 0.047 2-gram (n=1) 0.048 0.049 0.054 3-gram (n=2) 0.047 0.047 0.048 4-gram (n=3) 0.049 0.047 0.048 }}} [[BR]] To compare this with performance with String as a key, I joined the array and run the script. {{{ hash[words[i..n+i].join(" ")] += 1 }}} For the word count, I used this script. {{{ words.length.times do |i| hash[words[i]] += 1 end }}} '''Results''': MacRuby - hash with string as key (array joined) (in sec.) {{{ word (string) 0.030 0.029 0.027 2-gram 0.17 0.17 0.16 3-gram 0.18 0.18 0.19 4-gram 0.24 0.21 0.22 }}} '''Results''': Ruby 1.8.7 - hash with string as key (array joined) (in sec.) {{{ word (string) 0.0092 0.0091 0.0094 2-gram 0.045 0.041 0.039 3-gram 0.041 0.043 0.041 4-gram 0.048 0.048 0.049 }}} [[BR]] The second script ran much faster, but still MacRuby is approximately 2 to 3 times slower than Ruby 1.8.7. -- Ticket URL: <http://www.macruby.org/trac/ticket/1048> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Changes (by lsansonetti@…): * milestone: => MacRuby 1.0 Comment: After looking at this example, I notice 2 problems: * First, because Array#hash in MacRuby returns the same hash code for same-sized arrays, in this very specific case, it means the hash lookup will always hit the same slot range, then perform comparison operations on the various keys to find the right slot. This problem is already reported in #791 and I moved its milestone to 1.0. * Second, doing comparison operations on array keys is poorly optimized, resulting in many calls to the dispatcher and other costly operations. Moving this bug to 1.0. -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:1> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: new Priority: blocker | Milestone: MacRuby 1.0 Component: MacRuby | Keywords: ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): A fix for the second problem has been committed in r5041. -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:2> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: blocker | Milestone: MacRuby 0.9 Component: MacRuby | Resolution: fixed Keywords: | ----------------------------------+----------------------------------------- Changes (by lsansonetti@…): * status: new => closed * resolution: => fixed * milestone: MacRuby 1.0 => MacRuby 0.9 Comment: A fix for the first problem has been committed in r5042. In my environment, your test on a big file (/usr/share/dict/words) runs about the same speed as Ruby 1.8.7 (1.8.7 is a little bit faster, though). Please give it a try and re-open the ticket if necessary :) -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:3> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: blocker | Milestone: MacRuby 0.9 Component: MacRuby | Resolution: Keywords: | ----------------------------------+----------------------------------------- Changes (by yasuimao@…): * status: closed => reopened * resolution: fixed => Comment: I downloaded the latest nightly (0.9 12/17/2010) and further tested this issue. This time, I tested this with more than 1 file. The two scripts are following: '''1 File repeated 10 times''' {{{ hash = Hash.new(0) n = 1 10.times do |i| text = File.open("test.txt").read words = text.scan(/\w+/) (words.length - n).times do |i| hash[words[i..n+i].join(" ")] += 1 end end }}} '''Multiple files''' {{{ require 'find' hash = Hash.new(0) n = 1 Find.find(dir) do |file| next if File.extname(file) != ".txt" text = File.open(file).read words = text.scan(/\w+/) (words.length - n).times do |i| hash[words[i..n+i].join(" ")] += 1 end end }}} Both tests are repeated with String as a key and Array as a key (the sample scripts are with String as a key). {{{ String (Str): hash[words[i..n+i].join(" ")] += 1 Array (Ary): hash[words[i..n+i]] += 1 }}} For testing I used the following files: {{{ 1 file - 8,000 words 10 files - 79,000 words 18 files - 150,000 words }}} Also each test was repeated with different number of array elements 2 (n = 1), 3 (n = 2), and 4 (n = 3). '''Results - Ruby 1.8.7''' {{{ n 1 2 3 1 F rep - Str 0.51 0.54 0.58 1 F rep - Ary 0.67 0.70 0.72 10 F - Str 0.63 0.75 0.72 10 F - Ary 0.63 0.67 0.68 18 F - Str 1.19 1.39 1.43 18 F - Ary 1.36 1.37 1.41 }}} '''MacRuby 0.9 nightly 12/17/2010''' {{{ n 1 2 3 1 F rep - Str 1.73 1.84 1.96 1 F rep - Ary 1.62 2.04 2.27 10 F - Str 2.21 2.53 2.46 10 F - Ary 7.28 20.83 29.15 18 F - Str 4.35 4.64 5.10 18 F - Ary 24.68 88.95 131.77 }}} I'm wondering if this issue is still persisting or due to my poor scripts. -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:4> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: blocker | Milestone: Component: MacRuby | Resolution: Keywords: | ----------------------------------+----------------------------------------- Changes (by lsansonetti@…): * milestone: MacRuby 0.9 => -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:5> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: blocker | Milestone: Component: MacRuby | Resolution: Keywords: | ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): The numbers are too distant, there is surely a problem remaining. -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:6> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: blocker | Milestone: Component: MacRuby | Resolution: Keywords: | ----------------------------------+----------------------------------------- Comment(by lsansonetti@…): Yasu, is forcing the data as utf16 fixing the problem here too? -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:7> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: reopened Priority: blocker | Milestone: Component: MacRuby | Resolution: Keywords: | ----------------------------------+----------------------------------------- Comment(by yasuimao@…): I used a different machine, so the numbers are different from the previous posts, but here's the results of the tests. I ran the test sequentially this time rather than running the test one by one, so the results might have been affected, yet it looks like force_encoding("UTF-16BE") does fix the problem. Still, MacRuby is much slower than CRuby, exp. 1.9.2. {{{ MacRuby 0.9 2011/01/06 n 1 2 3 1 F rep - Str 28.43 28.84 28.09 1 F rep - Ary 27.82 28.39 28.23 10 F - Str 27.10 26.08 25.55 10 F - Ary 28.68 32.55 37.19 18 F - Str 54.21 53.66 53.97 18 F - Ary 68.74 104.04 117.75 MacRuby 0.9 2011/01/06 - force_encoding n 1 2 3 1 F rep - Str 0.59 0.75 0.86 1 F rep - Ary 0.47 0.56 0.57 10 F - Str 0.72 0.78 0.87 10 F - Ary 1.02 1.38 1.41 18 F - Str 1.45 1.58 1.86 18 F - Ary 3.35 6.21 6.08 MacRuby 0.9 2010/12/17 n 1 2 3 1 F rep - Str 1.30 1.51 1.75 1 F rep - Ary 1.21 1.58 1.99 10 F - Str 1.55 1.62 1.81 10 F - Ary 5.18 9.73 13.91 18 F - Str 3.33 3.18 3.42 18 F - Ary 14.86 49.34 75.62 MacRuby 0.9 2010/12/17 - force_encoding n 1 2 3 1 F rep - Str 0.52 0.63 0.64 1 F rep - Ary 0.45 0.53 0.53 10 F - Str 0.66 0.68 0.74 10 F - Ary 1.04 1.43 1.45 18 F - Str 1.32 1.48 1.62 18 F - Ary 3.58 5.42 6.30 CRuby 1.8.7 n 1 2 3 1 F rep - Str 0.38 0.61 0.92 1 F rep - Ary 0.51 0.83 0.93 10 F - Str 0.56 0.72 0.90 10 F - Ary 0.51 0.70 0.82 18 F - Str 1.12 1.39 1.75 18 F - Ary 1.34 1.64 1.67 CRuby 1.9.2 n 1 2 3 1 F rep - Str 0.22 0.26 0.31 1 F rep - Ary 0.43 0.47 0.52 10 F - Str 0.28 0.30 0.30 10 F - Ary 0.42 0.42 0.44 18 F - Str 0.52 0.62 0.64 18 F - Ary 0.86 0.88 0.90 }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:8> MacRuby <http://macruby.org/>
#1048: Performance of Hash with an Array as a key ----------------------------------+----------------------------------------- Reporter: yasuimao@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: blocker | Milestone: MacRuby 0.9 Component: MacRuby | Resolution: fixed Keywords: | ----------------------------------+----------------------------------------- Changes (by lsansonetti@…): * status: reopened => closed * resolution: => fixed * milestone: => MacRuby 0.9 Comment: I think that trunk should now be as fast, if not faster, than MacRuby 0.8, and that performance regressions have been fixed. Therefore, I'm closing this bug for the 0.9 milestone. We do not intend to improve performance over CRuby for 1.0, as we prefer to focus on stability and compatibility instead. We will look at performance very seriously after 1.0. -- Ticket URL: <http://www.macruby.org/trac/ticket/1048#comment:9> MacRuby <http://macruby.org/>
participants (1)
-
MacRuby