[MacRuby-devel] String performance (yet another)

Vincent Isambart vincent.isambart at gmail.com
Sun Jan 16 20:19:51 PST 2011


Hi,

> Indeed, String#[] will now perform slower on UTF8 non-ascii strings, because
> computing the character index cannot be done in constant time anymore.
> I don't believe this can be improved using the optimization we implemented
> for #gsub and #scan. Maybe 1.9.2 has a better optimization, I will let
> Vincent comment :)

> text = File.read("test.txt")
> 1000.times do |i|
>  a = text[i,i+30]
> end

In fact I already use the cache to get the offset for the end index.
I just had a look at 1.9.2 and what they do is pretty similar to what
we do. I would not be surprised if the difference was mainly due to
the object allocator being much slower in MacRuby.
I would need to shark to be sure but I would not expect much
improvement on String#[] soon.

And by the way to try with UTF-16 you should not use force_encoding
but encode, and not UTF-16BE but LE:
text = text.encode(Encoding::UTF_16LE)
because the fastest encoding is UTF-16LE and not BE (the native
encoding on x86 is little endian), and on a UTF-8 string, forcing the
encoding to ASCII or BINARY(ASCII-8BIT) would make sense (as all ASCII
characters are the same in UTF-8 and ASCII) but forcing it to UTF-16
would give you a meaningless string full of strange characters.


More information about the MacRuby-devel mailing list