[MacRuby] #1048: Performance of Hash with an Array as a key

MacRuby ruby-noreply at macosforge.org
Wed Dec 15 09:36:51 PST 2010


#1048: Performance of Hash with an Array as a key
----------------------------------+-----------------------------------------
 Reporter:  yasuimao@…            |       Owner:  lsansonetti@…        
     Type:  defect                |      Status:  new                  
 Priority:  blocker               |   Milestone:                       
Component:  MacRuby               |    Keywords:                       
----------------------------------+-----------------------------------------
 I found another performance issue related to text processing using Hash.

 This little script is an attempt to count n-grams (n-words sequences) in
 text.  The same script on Ruby 1.8.7 runs much faster and not affected by
 the number of array elements.


 '''Script'''

 {{{
 n = 1
 hash = Hash.new(0)
 words = File.open("test.txt").read.scan(/\w+/)
 (words.length - n).times do |i|
   hash[words[i..n+i]] += 1
 end
 }}}

 I used a text file with about 8000 English words.  I ran the test 3 times
 for each of 1 to 4 grams (1 to 4 array elements) to check that the results
 were consistent.  Only the processing times of the block part are shown.

 [[BR]]


 '''Results''': MacRuby - hash with array as key (in sec.)

 {{{
 word   (n=0)     3.95    4.00    3.96
 2-gram (n=1)    12.35   13.02   13.16
 3-gram (n=2)    17.97   17.90   17.92
 4-gram (n=3)    21.26   21.22   20.78
 }}}

 '''Results''': Ruby 1.8.7 - hash with array as key (in sec.)

 {{{
 word   (n=0)    0.049   0.048   0.047
 2-gram (n=1)    0.048   0.049   0.054
 3-gram (n=2)    0.047   0.047   0.048
 4-gram (n=3)    0.049   0.047   0.048
 }}}


 [[BR]]

 To compare this with performance with String as a key, I joined the array
 and run the script.

 {{{
 hash[words[i..n+i].join(" ")] += 1
 }}}

 For the word count, I used this script.

 {{{
 words.length.times do |i|
   hash[words[i]] += 1
 end
 }}}


 '''Results''': MacRuby - hash with string as key (array joined) (in sec.)

 {{{
 word (string)   0.030   0.029   0.027
 2-gram          0.17    0.17    0.16
 3-gram          0.18    0.18    0.19
 4-gram          0.24    0.21    0.22
 }}}

 '''Results''': Ruby 1.8.7 - hash with string as key (array joined) (in
 sec.)

 {{{
 word (string)   0.0092  0.0091  0.0094
 2-gram          0.045   0.041   0.039
 3-gram          0.041   0.043   0.041
 4-gram          0.048   0.048   0.049
 }}}

 [[BR]]

 The second script ran much faster, but still MacRuby is approximately 2 to
 3 times slower than Ruby 1.8.7.

-- 
Ticket URL: <http://www.macruby.org/trac/ticket/1048>
MacRuby <http://macruby.org/>



More information about the macruby-tickets mailing list