Hi, I made a testbed patch to deal with multi-byte characters. I'm not sure if using NSUTF16LittleEndianStringEncoding for UniChar is appropriate on PPC. What do you think? -- Satoshi Nakagawa
I made a testbed patch to deal with multi-byte characters.
In imp_rb_string_characterAtIndex, you can just return - characterAtIndex: from the initialized string instead of jumping through the data step. The value returned should be in the native encoding. @@ -2444,26 +2446,41 @@ static UniChar imp_rb_string_characterAtIndex(void *rcv, SEL sel, NSUInteger idx) { - if (idx >= RARRAY_LEN(rcv)) + VALUE rstr; + NSString* ocstr; + NSData* data; + int length = NUM2INT(rb_str_length((VALUE)rcv)); + UniChar c; + + if (idx >= length) [NSException raise:@"NSRangeException" - format:@"index (%d) beyond bounds (%d)", idx, RARRAY_LEN(rcv)]; - /* FIXME this is not quite true for multibyte strings */ - return (UniChar)RSTRING_PTR(rcv)[idx]; + format:@"index (%d) beyond bounds (%d)", idx, length]; + + rstr = rb_str_substr((VALUE)rcv, idx, 1); + ocstr = [NSString stringWithCString:RSTRING_PTR(rstr) encoding:NSUTF8StringEncoding]; + return [ocstr characterAtIndex:0]; } -Ben
participants (2)
-
Benjamin Stiglitz
-
Satoshi Nakagawa