[MacRuby-devel] Regular expression related performance

Yasu Imao yimao.ml at gmail.com
Thu Dec 2 05:20:58 PST 2010


Hi Laurent,

Thank you for your prompt work.  I tried the latest nightly build and it's much faster than 0.7.1.  The Test 1 and Test 2 are only 2 - 2.5 times slower than those on Ruby 1.8.7 and Test 3 is about 5 times slower.  And I tried my app on this nightly build.  Now I can say MacRuby version of my app is quite usable.  From now on, I'll be more serious about rewriting my RubyCocoa apps in MacRuby.

But I was curious about the difference between String#scan and String#gsub, so I also tested String#gsub without a block.  

text.gsub!(/\w+/,"test")

This was also about 5 times slower on MacRuby than Ruby 1.8.7.  Could this be a bit more faster?  This is not in the main process of my apps (for pre-processing of text), so the performance of String#gsub doesn't affect as much, though.

And thanks for looking into the regexp bug.  I guess I'll have to wait and see if Apple updates ICU on OS X.

Best,
Yasu

On 2010/12/02, at 9:39, Laurent Sansonetti wrote:

> Hi Yasu,
> 
> It's committed to trunk, it should be available in tonight's nightly build, so feel free to grab it :) http://www.macruby.org/files/nightlies. It will also be in the upcoming 0.8 release.
> 
> I see your ticket about the look-ahead regexp bug, I will have a look later today. Thanks for reporting the problem. Hopefully it can also be fixed for 0.8.
> 
> Laurent
> 
> On Dec 1, 2010, at 4:29 PM, Yasu Imao wrote:
> 
>> Hi Laurent,
>> 
>> This is great!  I think I read in the discussion of StringScanner performance about object allocation (though I didn't understand what exactly was happening behind the scene), so I guessed it was about 'using block' with regular expression match data.  
>> 
>> For a word frequency count feature, I could use Test 2 script, but for other part of the app, I needed match information ($`, $' to be exact), so this performance improvement means a lot to my app.
>> 
>> Is this going to be in 0.8?  Then, I'll test this with my app.
>> 
>> By the way, the regular expression itself seems to have a bug (not related to this, but to negative look-ahead) and I issued(?) a ticket (though I'm not sure I did it properly).
>> 
>> Best,
>> Yasu
>> 
>> On 2010/12/02, at 8:50, Laurent Sansonetti wrote:
>> 
>>> I spoke too fast, having a second look I found that it was possible to make the Match strings point to a unique object. I committed this optimization in r4964 and verified that no regression is introduced.
>>> 
>>> Before:
>>> 
>>> $ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); text.scan(/\w+/) {}"
>>> 
>>> real	0m2.430s
>>> user	0m1.628s
>>> sys	0m1.030s
>>> 
>>> After :)
>>> 
>>> $ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); text.scan(/\w+/) {}"
>>> 
>>> real	0m0.121s
>>> user	0m0.100s
>>> sys	0m0.015s
>>> 
>>> Laurent
>>> 
>>> On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote:
>>> 
>>>> Hi Yasu,
>>>> 
>>>> I ran your tests in Shark. Tests 1 and 3 are significantly slower because #scan and #gsub are called with a block, which means MacRuby has to create a new Match object for every yield, to conform to the Ruby specs. Each Match object contains a copy of the original string.
>>>> 
>>>> MacRuby has a slow memory allocator (much slower than the original Ruby), so one must be careful to not allocate too many objects. This is something we are working on, unfortunately MacRuby doesn't fully control the object allocator, as it resides in the libauto library (the Objective-C garbage collector).
>>>> 
>>>> In your case, I recommend using the method in Test 2, which is to not pass a block. 
>>>> 
>>>> It is possible that we can reduce memory usage when doing regexps in MacRuby, however after having a quick look at the source code I am not sure something can be done for 0.8 :(
>>>> 
>>>> Laurent
>>>> 
>>>> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> I'm rewriting an app for text analysis in MacRuby, which I originally wrote in RubyCocoa.  But I encountered a serious performance issue in MacRuby, which is related to processing text using regular expressions.  
>>>>> 
>>>>> I'm wondering if this will be taken care of in the near future (or already done in 0.8?).
>>>>> 
>>>>> Below are my simple tests.  The first two are essentially the same with a slightly different approach.  Both are simply counting frequency of each word.  I want to use the first approach not to count word frequencies, but in other processes.  The third one is to test the speed of String#gsub with regular expression.  I felt String#gsub was slow in my app, so I just wanted to test how slow it is compared to RubyCocoa.
>>>>> 
>>>>> 
>>>>> Test 1 - scan-block
>>>>> 
>>>>> freq = Hash.new(0)
>>>>> text.scan(/\w+/) do |word|
>>>>> freq[word] += 1
>>>>> end
>>>>> 
>>>>> 
>>>>> Test 2 - scan array.each
>>>>> 
>>>>> freq = Hash.new(0)
>>>>> text.scan(/\w+/).each do |word|
>>>>> freq[word] += 1
>>>>> end
>>>>> 
>>>>> 
>>>>> Test 3 - gsub upcase
>>>>> 
>>>>> text.gsub!(/\w+/){|x| x.upcase}  
>>>>> 
>>>>> 
>>>>> The results are in seconds.  The original text is in English with 8154 words.  Each process was repeated 10 times to calculate processing times.  Each test were done 3 times.
>>>>> 
>>>>> Ruby 1.8.7	 Test1 - scan-block:			  0.542,    0.502,    0.518
>>>>> Ruby 1.8.7	 Test2 - scan array.each:	 	  0.399,    0.392,    0.399
>>>>> Ruby 1.8.7	 Test3 - gsub upcase:		  0.384,    0.349,    0.390
>>>>> 
>>>>> MacRuby 0.7.1 Test1 - scan-block:      		27.612,  27.707,  27.453
>>>>> MacRuby 0.7.1 Test2 - scan array.each: 	  3.556,    3.616,    3.554
>>>>> MacRuby 0.7.1 Test3 - gsub upcase:    		27.613,  26.826,  27.327
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Yasu
>>>>> _______________________________________________
>>>>> MacRuby-devel mailing list
>>>>> MacRuby-devel at lists.macosforge.org
>>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>>> 
>>>> _______________________________________________
>>>> MacRuby-devel mailing list
>>>> MacRuby-devel at lists.macosforge.org
>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>> 
>>> _______________________________________________
>>> MacRuby-devel mailing list
>>> MacRuby-devel at lists.macosforge.org
>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>> 
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel at lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
> 
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel



More information about the MacRuby-devel mailing list