[MacRuby-devel] Regular expression related performance

Wed Dec 1 15:50:29 PST 2010

I spoke too fast, having a second look I found that it was possible to make the Match strings point to a unique object. I committed this optimization in r4964 and verified that no regression is introduced.

Before:

$ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); text.scan(/\w+/) {}"

real	0m2.430s
user	0m1.628s
sys	0m1.030s

After :)

$ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); text.scan(/\w+/) {}"

real	0m0.121s
user	0m0.100s
sys	0m0.015s

Laurent

On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote:

> Hi Yasu,
> 
> I ran your tests in Shark. Tests 1 and 3 are significantly slower because #scan and #gsub are called with a block, which means MacRuby has to create a new Match object for every yield, to conform to the Ruby specs. Each Match object contains a copy of the original string.
> 
> MacRuby has a slow memory allocator (much slower than the original Ruby), so one must be careful to not allocate too many objects. This is something we are working on, unfortunately MacRuby doesn't fully control the object allocator, as it resides in the libauto library (the Objective-C garbage collector).
> 
> In your case, I recommend using the method in Test 2, which is to not pass a block. 
> 
> It is possible that we can reduce memory usage when doing regexps in MacRuby, however after having a quick look at the source code I am not sure something can be done for 0.8 :(
> 
> Laurent
> 
> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
> 
>> Hello,
>> 
>> I'm rewriting an app for text analysis in MacRuby, which I originally wrote in RubyCocoa.  But I encountered a serious performance issue in MacRuby, which is related to processing text using regular expressions.  
>> 
>> I'm wondering if this will be taken care of in the near future (or already done in 0.8?).
>> 
>> Below are my simple tests.  The first two are essentially the same with a slightly different approach.  Both are simply counting frequency of each word.  I want to use the first approach not to count word frequencies, but in other processes.  The third one is to test the speed of String#gsub with regular expression.  I felt String#gsub was slow in my app, so I just wanted to test how slow it is compared to RubyCocoa.
>> 
>> 
>> Test 1 - scan-block
>> 
>> freq = Hash.new(0)
>> text.scan(/\w+/) do |word|
>>  freq[word] += 1
>> end
>> 
>> 
>> Test 2 - scan array.each
>> 
>> freq = Hash.new(0)
>> text.scan(/\w+/).each do |word|
>>  freq[word] += 1
>> end
>> 
>> 
>> Test 3 - gsub upcase
>> 
>> text.gsub!(/\w+/){|x| x.upcase}  
>> 
>> 
>> The results are in seconds.  The original text is in English with 8154 words.  Each process was repeated 10 times to calculate processing times.  Each test were done 3 times.
>> 
>> Ruby 1.8.7	 Test1 - scan-block:			  0.542,    0.502,    0.518
>> Ruby 1.8.7	 Test2 - scan array.each:	 	  0.399,    0.392,    0.399
>> Ruby 1.8.7	 Test3 - gsub upcase:		  0.384,    0.349,    0.390
>> 
>> MacRuby 0.7.1 Test1 - scan-block:      		27.612,  27.707,  27.453
>> MacRuby 0.7.1 Test2 - scan array.each: 	  3.556,    3.616,    3.554
>> MacRuby 0.7.1 Test3 - gsub upcase:    		27.613,  26.826,  27.327
>> 
>> 
>> Thanks,
>> Yasu
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel at lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
> 
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/macruby-devel/attachments/20101201/158508bf/attachment-0001.html>