[MacRuby-devel] Regular expression related performance

Yasu Imao yimao.ml at gmail.com
Wed Dec 1 16:29:48 PST 2010


Hi Laurent,

This is great!  I think I read in the discussion of StringScanner performance about object allocation (though I didn't understand what exactly was happening behind the scene), so I guessed it was about 'using block' with regular expression match data.  

For a word frequency count feature, I could use Test 2 script, but for other part of the app, I needed match information ($`, $' to be exact), so this performance improvement means a lot to my app.

Is this going to be in 0.8?  Then, I'll test this with my app.

By the way, the regular expression itself seems to have a bug (not related to this, but to negative look-ahead) and I issued(?) a ticket (though I'm not sure I did it properly).

Best,
Yasu

On 2010/12/02, at 8:50, Laurent Sansonetti wrote:

> I spoke too fast, having a second look I found that it was possible to make the Match strings point to a unique object. I committed this optimization in r4964 and verified that no regression is introduced.
> 
> Before:
> 
> $ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); text.scan(/\w+/) {}"
> 
> real	0m2.430s
> user	0m1.628s
> sys	0m1.030s
> 
> After :)
> 
> $ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); text.scan(/\w+/) {}"
> 
> real	0m0.121s
> user	0m0.100s
> sys	0m0.015s
> 
> Laurent
> 
> On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote:
> 
>> Hi Yasu,
>> 
>> I ran your tests in Shark. Tests 1 and 3 are significantly slower because #scan and #gsub are called with a block, which means MacRuby has to create a new Match object for every yield, to conform to the Ruby specs. Each Match object contains a copy of the original string.
>> 
>> MacRuby has a slow memory allocator (much slower than the original Ruby), so one must be careful to not allocate too many objects. This is something we are working on, unfortunately MacRuby doesn't fully control the object allocator, as it resides in the libauto library (the Objective-C garbage collector).
>> 
>> In your case, I recommend using the method in Test 2, which is to not pass a block. 
>> 
>> It is possible that we can reduce memory usage when doing regexps in MacRuby, however after having a quick look at the source code I am not sure something can be done for 0.8 :(
>> 
>> Laurent
>> 
>> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
>> 
>>> Hello,
>>> 
>>> I'm rewriting an app for text analysis in MacRuby, which I originally wrote in RubyCocoa.  But I encountered a serious performance issue in MacRuby, which is related to processing text using regular expressions.  
>>> 
>>> I'm wondering if this will be taken care of in the near future (or already done in 0.8?).
>>> 
>>> Below are my simple tests.  The first two are essentially the same with a slightly different approach.  Both are simply counting frequency of each word.  I want to use the first approach not to count word frequencies, but in other processes.  The third one is to test the speed of String#gsub with regular expression.  I felt String#gsub was slow in my app, so I just wanted to test how slow it is compared to RubyCocoa.
>>> 
>>> 
>>> Test 1 - scan-block
>>> 
>>> freq = Hash.new(0)
>>> text.scan(/\w+/) do |word|
>>>  freq[word] += 1
>>> end
>>> 
>>> 
>>> Test 2 - scan array.each
>>> 
>>> freq = Hash.new(0)
>>> text.scan(/\w+/).each do |word|
>>>  freq[word] += 1
>>> end
>>> 
>>> 
>>> Test 3 - gsub upcase
>>> 
>>> text.gsub!(/\w+/){|x| x.upcase}  
>>> 
>>> 
>>> The results are in seconds.  The original text is in English with 8154 words.  Each process was repeated 10 times to calculate processing times.  Each test were done 3 times.
>>> 
>>> Ruby 1.8.7	 Test1 - scan-block:			  0.542,    0.502,    0.518
>>> Ruby 1.8.7	 Test2 - scan array.each:	 	  0.399,    0.392,    0.399
>>> Ruby 1.8.7	 Test3 - gsub upcase:		  0.384,    0.349,    0.390
>>> 
>>> MacRuby 0.7.1 Test1 - scan-block:      		27.612,  27.707,  27.453
>>> MacRuby 0.7.1 Test2 - scan array.each: 	  3.556,    3.616,    3.554
>>> MacRuby 0.7.1 Test3 - gsub upcase:    		27.613,  26.826,  27.327
>>> 
>>> 
>>> Thanks,
>>> Yasu
>>> _______________________________________________
>>> MacRuby-devel mailing list
>>> MacRuby-devel at lists.macosforge.org
>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>> 
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel at lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
> 
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel



More information about the MacRuby-devel mailing list