String#sub/gsub and text encodings

15 May 2011

      Hi,

I just wrote a simple script for text processing and encountered a problem with String#sub/gsub.

Original text: UTF-8 encoded ASCII character only text
Replacing text: UTF-8 encoded text with ASCII and non-ASCII characters (including Japanese characters)

The resulting text: all the non-ASCII characters were garbage.  

When I split the original text at the strings to be replaced and inserted the replacing text at these places, the resulting string object was fine; all the characters were kept as they should be in UTF-8 encoding.

I checked the tickets, but couldn't find something like this.  Is this a known issue?

Best,
Yasu

Yasu Imao

Caio Chassot

Laurent Sansonetti

Caio Chassot

Laurent Sansonetti

Yasu Imao

Vincent Isambart

tags

participants (4)