[MacRuby-devel] String#sub/gsub and text encodings

Yasu Imao yimao.ml at gmail.com
Fri May 20 17:16:58 PDT 2011


Hi,

I finally found time to further investigate this.

It thought it was sub/gsub in general, but it was sub!/gsub! in a special case, which is reading a text file with NSMutableString#initWithContentsOfFile:encoding:error:

I created a text file in UTF-8 and the content is


this is a test script.


Then here's what I tested.


#!/usr/local/bin/macruby
framework 'cocoa'
# -*- encoding: UTF-8 -*-

a = "this is a test script."
b = NSMutableString.alloc.initWithContentsOfFile("test.txt",encoding:NSUTF8StringEncoding,error:nil)

p a.encoding
#=> #<Encoding:UTF-8>

p data.encoding
#=> #<Encoding:UTF-8>

print a.sub(/test/,"$B$"(B")
#=> this is a $B$"(B script.

print b.sub(/test/,"$B$"(B")
#=> this is a $B$"(B script.

a.sub!(/test/,"$B$"(B")
print a
#=> this is a $B$"(B script.

b.sub!(/test/,"$B$"(B")
print b
#=> This is a $B!1(IAB(B script.



Am I doing something wrong?  If not, I'll file a ticket.


Best,
Yasu


On 2011/05/16, at 7:37, Laurent Sansonetti wrote:

> If the script works different in CRuby 1.9, then a ticket will be helpful too, as it is likely something we need to fix. I don't know by heart if it's a well-known issue, but we will figure it out later. Filling dups is always a good idea as it helps up prioritizing work.
> 
> Thanks,
> Laurent
> 
> On May 15, 2011, at 8:10 AM, Caio Chassot wrote:
> 
>> Hi,
>> 
>> Can you post some sample code?
>> 
>> Thanks
>> 
>> On Sun, May 15, 2011 at 11:50, Yasu Imao <yimao.ml at gmail.com> wrote:
>>> Hi,
>>> 
>>> I just wrote a simple script for text processing and encountered a problem with String#sub/gsub.
>>> 
>>> Original text: UTF-8 encoded ASCII character only text
>>> Replacing text: UTF-8 encoded text with ASCII and non-ASCII characters (including Japanese characters)
>>> 
>>> The resulting text: all the non-ASCII characters were garbage.
>>> 
>>> When I split the original text at the strings to be replaced and inserted the replacing text at these places, the resulting string object was fine; all the characters were kept as they should be in UTF-8 encoding.
>>> 
>>> I checked the tickets, but couldn't find something like this.  Is this a known issue?
>>> 
>>> Best,
>>> Yasu
>>> _______________________________________________
>>> MacRuby-devel mailing list
>>> MacRuby-devel at lists.macosforge.org
>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>> 
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel at lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
> 
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel



More information about the MacRuby-devel mailing list