[MacRuby-devel] String#sub/gsub and text encodings
Yasu Imao
yimao.ml at gmail.com
Fri May 20 17:16:58 PDT 2011
Hi,
I finally found time to further investigate this.
It thought it was sub/gsub in general, but it was sub!/gsub! in a special case, which is reading a text file with NSMutableString#initWithContentsOfFile:encoding:error:
I created a text file in UTF-8 and the content is
this is a test script.
Then here's what I tested.
#!/usr/local/bin/macruby
framework 'cocoa'
# -*- encoding: UTF-8 -*-
a = "this is a test script."
b = NSMutableString.alloc.initWithContentsOfFile("test.txt",encoding:NSUTF8StringEncoding,error:nil)
p a.encoding
#=> #<Encoding:UTF-8>
p data.encoding
#=> #<Encoding:UTF-8>
print a.sub(/test/,"$B$"(B")
#=> this is a $B$"(B script.
print b.sub(/test/,"$B$"(B")
#=> this is a $B$"(B script.
a.sub!(/test/,"$B$"(B")
print a
#=> this is a $B$"(B script.
b.sub!(/test/,"$B$"(B")
print b
#=> This is a $B!1(IAB(B script.
Am I doing something wrong? If not, I'll file a ticket.
Best,
Yasu
On 2011/05/16, at 7:37, Laurent Sansonetti wrote:
> If the script works different in CRuby 1.9, then a ticket will be helpful too, as it is likely something we need to fix. I don't know by heart if it's a well-known issue, but we will figure it out later. Filling dups is always a good idea as it helps up prioritizing work.
>
> Thanks,
> Laurent
>
> On May 15, 2011, at 8:10 AM, Caio Chassot wrote:
>
>> Hi,
>>
>> Can you post some sample code?
>>
>> Thanks
>>
>> On Sun, May 15, 2011 at 11:50, Yasu Imao <yimao.ml at gmail.com> wrote:
>>> Hi,
>>>
>>> I just wrote a simple script for text processing and encountered a problem with String#sub/gsub.
>>>
>>> Original text: UTF-8 encoded ASCII character only text
>>> Replacing text: UTF-8 encoded text with ASCII and non-ASCII characters (including Japanese characters)
>>>
>>> The resulting text: all the non-ASCII characters were garbage.
>>>
>>> When I split the original text at the strings to be replaced and inserted the replacing text at these places, the resulting string object was fine; all the characters were kept as they should be in UTF-8 encoding.
>>>
>>> I checked the tickets, but couldn't find something like this. Is this a known issue?
>>>
>>> Best,
>>> Yasu
>>> _______________________________________________
>>> MacRuby-devel mailing list
>>> MacRuby-devel at lists.macosforge.org
>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>>
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel at lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
More information about the MacRuby-devel
mailing list