[MacRuby-devel] Regular expression related performance
Laurent Sansonetti
lsansonetti at apple.com
Wed Dec 1 14:46:51 PST 2010
Hi Yasu,
I ran your tests in Shark. Tests 1 and 3 are significantly slower because #scan and #gsub are called with a block, which means MacRuby has to create a new Match object for every yield, to conform to the Ruby specs. Each Match object contains a copy of the original string.
MacRuby has a slow memory allocator (much slower than the original Ruby), so one must be careful to not allocate too many objects. This is something we are working on, unfortunately MacRuby doesn't fully control the object allocator, as it resides in the libauto library (the Objective-C garbage collector).
In your case, I recommend using the method in Test 2, which is to not pass a block.
It is possible that we can reduce memory usage when doing regexps in MacRuby, however after having a quick look at the source code I am not sure something can be done for 0.8 :(
Laurent
On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
> Hello,
>
> I'm rewriting an app for text analysis in MacRuby, which I originally wrote in RubyCocoa. But I encountered a serious performance issue in MacRuby, which is related to processing text using regular expressions.
>
> I'm wondering if this will be taken care of in the near future (or already done in 0.8?).
>
> Below are my simple tests. The first two are essentially the same with a slightly different approach. Both are simply counting frequency of each word. I want to use the first approach not to count word frequencies, but in other processes. The third one is to test the speed of String#gsub with regular expression. I felt String#gsub was slow in my app, so I just wanted to test how slow it is compared to RubyCocoa.
>
>
> Test 1 - scan-block
>
> freq = Hash.new(0)
> text.scan(/\w+/) do |word|
> freq[word] += 1
> end
>
>
> Test 2 - scan array.each
>
> freq = Hash.new(0)
> text.scan(/\w+/).each do |word|
> freq[word] += 1
> end
>
>
> Test 3 - gsub upcase
>
> text.gsub!(/\w+/){|x| x.upcase}
>
>
> The results are in seconds. The original text is in English with 8154 words. Each process was repeated 10 times to calculate processing times. Each test were done 3 times.
>
> Ruby 1.8.7 Test1 - scan-block: 0.542, 0.502, 0.518
> Ruby 1.8.7 Test2 - scan array.each: 0.399, 0.392, 0.399
> Ruby 1.8.7 Test3 - gsub upcase: 0.384, 0.349, 0.390
>
> MacRuby 0.7.1 Test1 - scan-block: 27.612, 27.707, 27.453
> MacRuby 0.7.1 Test2 - scan array.each: 3.556, 3.616, 3.554
> MacRuby 0.7.1 Test3 - gsub upcase: 27.613, 26.826, 27.327
>
>
> Thanks,
> Yasu
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/macruby-devel/attachments/20101201/f6e84c89/attachment.html>
More information about the MacRuby-devel
mailing list