[MacRuby] #1048: Performance of Hash with an Array as a key
MacRuby
ruby-noreply at macosforge.org
Wed Dec 15 09:36:51 PST 2010
#1048: Performance of Hash with an Array as a key
----------------------------------+-----------------------------------------
Reporter: yasuimao@… | Owner: lsansonetti@…
Type: defect | Status: new
Priority: blocker | Milestone:
Component: MacRuby | Keywords:
----------------------------------+-----------------------------------------
I found another performance issue related to text processing using Hash.
This little script is an attempt to count n-grams (n-words sequences) in
text. The same script on Ruby 1.8.7 runs much faster and not affected by
the number of array elements.
'''Script'''
{{{
n = 1
hash = Hash.new(0)
words = File.open("test.txt").read.scan(/\w+/)
(words.length - n).times do |i|
hash[words[i..n+i]] += 1
end
}}}
I used a text file with about 8000 English words. I ran the test 3 times
for each of 1 to 4 grams (1 to 4 array elements) to check that the results
were consistent. Only the processing times of the block part are shown.
[[BR]]
'''Results''': MacRuby - hash with array as key (in sec.)
{{{
word (n=0) 3.95 4.00 3.96
2-gram (n=1) 12.35 13.02 13.16
3-gram (n=2) 17.97 17.90 17.92
4-gram (n=3) 21.26 21.22 20.78
}}}
'''Results''': Ruby 1.8.7 - hash with array as key (in sec.)
{{{
word (n=0) 0.049 0.048 0.047
2-gram (n=1) 0.048 0.049 0.054
3-gram (n=2) 0.047 0.047 0.048
4-gram (n=3) 0.049 0.047 0.048
}}}
[[BR]]
To compare this with performance with String as a key, I joined the array
and run the script.
{{{
hash[words[i..n+i].join(" ")] += 1
}}}
For the word count, I used this script.
{{{
words.length.times do |i|
hash[words[i]] += 1
end
}}}
'''Results''': MacRuby - hash with string as key (array joined) (in sec.)
{{{
word (string) 0.030 0.029 0.027
2-gram 0.17 0.17 0.16
3-gram 0.18 0.18 0.19
4-gram 0.24 0.21 0.22
}}}
'''Results''': Ruby 1.8.7 - hash with string as key (array joined) (in
sec.)
{{{
word (string) 0.0092 0.0091 0.0094
2-gram 0.045 0.041 0.039
3-gram 0.041 0.043 0.041
4-gram 0.048 0.048 0.049
}}}
[[BR]]
The second script ran much faster, but still MacRuby is approximately 2 to
3 times slower than Ruby 1.8.7.
--
Ticket URL: <http://www.macruby.org/trac/ticket/1048>
MacRuby <http://macruby.org/>
More information about the macruby-tickets
mailing list