[MacRuby] #159: "%d" with large integer argument gives conversion error
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@37signals.com | Owner: lsansonetti@apple.com Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ The following script reproduces the error easily: {{{ N = 34363859232 p N p N.class puts "%d" % N }}} This results in the following exception: {{{ 01.rb:5:in `%': integer 34363859232 too big to convert to `int' (RangeError) }}} When run in ruby 1.9, the output shows that N is a Bignum. When run in macruby, the output shows that N is a fixnum. This appears to be due to bignum.c line 523: {{{ if (len <= (sizeof(long)*CHAR_BIT)) { }}} Apparently, in macruby, sizeof(long) is 8, which means bignums are only instantiated for integers larger than 64 bits or so, and in the above example, the integer is only 44 bits. Even though the bignums are only created based on sizeof(long), the rb_str_format function in sprintf.c uses FIX2LONG for converting the VALUE to a long int, and FIX2LONG uses INT_MIN and INT_MAX to check the bounds of the integer. I'm not sure what the safest way to fix this is. Should check_int (and others?) in numeric.c be comparing against larger boundary numbers, instead of INT_MIN/MAX? -- Ticket URL: <http://www.macruby.org/trac/ticket/159> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@37signals.com | Owner: lsansonetti@apple.com Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Resolution: Keywords: | ---------------------------------+------------------------------------------ Comment(by lsansonetti@apple.com): At a glance it looks like the problem is in the way we convert the argument to a C_INT type, to pass to the CFStringCreateWithFormat function, in the objc.m file - rb_str_format(). Instead of trying to convert big fixnums or bignums to C_INT we should instead take a different code path and print every digit out of the numeric type. This easily reproduces when trying to print a bignum: {{{ $ ./miniruby -e "p '%d' % (42**12)" -e:1:in `%': bignum too big to convert into `long' (RangeError) from -e:1:in `<main>' $ }}} (sprintf.c is mostly unused in MacRuby, because we rely on CF for this.) -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:1> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by emoy@…): (Sorry, I had typed up a long message, attached my test program, and now my text is all gone. I will retype it again after some sleep, grumble, grumble) -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:2> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by emoy@…): (Splitting into 3 parts) Part 1 of 3 The problem is that the MacRuby version of sprintf eventually calls CFStringCreateWithFormat (which is actually wrong, since the resulting string is immutable, but that is a secondary issue). Since CFStringCreateWithFormat know nothing of Bignums, it can never convert them directly. So the most immediate fix would be to do the integer convertion (handling either Fixnums or Bignums) and pass the resulting string to CFStringCreateWithFormat for output. Well, it turns out to not be so simple. Doing this means that we have to interpret most of the format specifier flags (all except '-') and deal with the precision (and with width when precision is unspecified and certain flags are used). So we have to parse the entire format specifier, but this has the advantage that we can generate appropriate error exceptions (the existing sprintf does minimal error checking, and the call to CFStringCreateWithFormat provides no further error diagnosis). Since the goal is to make the MacRuby sprintf work like a standard ruby one, the error messages should likewise be consistent. This turned out to be rather problematic, since that consistency imposes lots of ordering constraints on what gets checked and when. Another tedious issue was the support of the infinite precision representations, like: printf "%x", -1 #=> ..f I hooked into the Fixnum and Bignum code to generate hex, octal or binary representations, then manipulated the string to get the correct result. But after many iterations, and using a test program that reproduced the 68 examples from the sprintf source file, plus a few hundred other test cases (many error cases), I finally had a version that produced the same (if not sometimes confusing or poorly worded) error messages as in ruby 1.9.1. OK, not quite the same. Some error message mentioned NSMutableString or NSArray instead of just plain String or Array. I've always thought that the principle of least astonishment should mean that: % macruby -e 'puts String.new.class' should be "String", but it is currently "NSMutableString". It is a four line change, one line for each of array.c, hash.c, object.c and string.c to get them to print the standard ruby type, and not the Objective-C type (I attach those diffs as pola.diff). I also noticed that the compiler uses StringValueCStr() to convert from ruby object to c-string, but StringValueCStr() calls to_str, which is technically incorrect. So I changed it to call rb_obj_as_string(), which uses to_s. However, some inconsistencies remained. Some are fundament to the underlying CoreFoundation/Objective-C infrastructure, like no null characters in strings. But because we still call CFStringCreateWithFormat, any issues due to type conversion is necessarily handled after other error checking, so: printf "%s %z\n", BasicObject.new, 2 produces: undefined method `to_s' for #<BasicObject:0x0000010083d900> (NoMethodError) by ruby 1.9, but produces: malformed format string - %z (ArgumentError) in MacRuby, because the attempt to call to_s is delayed until the rest of the string is error checked. (As another side issue, replacing %z with the legal%d will cause MacRuby to crash, since BasicObject doesn't have respond_to? either, which the lack of causes infinite recursion and eventual stack exhaustion. This is fixed with a small change to dispatcher.cpp, using a new variable that gets defined in vm_method.c.) To extend test coverage, I modified my test program to automatically generate more cases, expanding to character, float and string conversion, and using all combinations and orders of flags, plus different widths, precisions and values. Then I discovered some new issues. The '0' flag applies to all conversion in C-standard printf routines like CFStringCreateWithFormat, but in ruby, '0' only applies to numeric conversion. I could go ahead and special case that, but I couldn't help expecting more such special casing in the volumes test output differences. (Continued...) -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:3> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by emoy@…): Part 2 of 3 So I wondered if it was possible to port the ruby 1.9.1 version of sprintf to MacRuby. I decided to remove all the multibyte and encoding stuff, and just convert the format string to UTF-16, so I didn't have to worry about encoding. Then I just used a NSMutableString to accumulate the output. After dealing with padding and floating point conversion (which I do in ASCII and then append to the NSMutableString), and merging some code from the previous sprintf.cpp, it finally compiled. Then more bug diagnosis and iteration, and I finally got it to produce the same output as ruby 1.9.1, in all 148901 test cases (well, actually there was one difference, but that was due to String#inspect and String@dump, and is a different issue). As it always seems to turn out, there was yet-another issue that I had to fix. Floating point conversion seemed to produce slightly different numbers than ruby 1.9 (and 1.8). This turned out to be due to the fact that in MacRuby, only Fixfloat format, so the last two bits of the value are stomped on. The struct RFloat structure that wraps floating point numbers in ruby 1.x, isn't being used. So I changed the code to use Fixfloat if it doesn't use those last two bits, otherwise switch to struct RFloat (in 32-bit mode, it is worse, since Fixfloat uses 32-bit floating point numbers, which seriously limits precision, so I always use struct RFloat in 32-bit). Then I run into a problem in the compiler; it doesn't know how to deal with the struct RFloat when creating floating point literals. So I add code to numeric.c, that converts a floating point number to a string (using the %a hex format of the C-based sprintf, which is a exact translation to and from). Then I duplicate the Bignum code literal code, and change it to do floating point. The compiler is now happy. I also looked for occurrences of DBL2FIXFLOAT and FIXFLOAT2DBL, and where appropriate, replaced them with DOUBLE2NUM and RFLOAT_VALUE, respectively. FIXFLOAT_P(x) gets replaced with TYPE(x) == T_FLOAT. marshal.c needed to know about the struct RFloat instances, and even rb_type (aka TYPE()) needed to know about struct RFloat (and I added T_BIGNUM support, which was missing). '''Note: I have found a few other places in the compiler that assume Fixfloat, and stomps on those lower two bits. What it should do is call the appropriate rb_float_* routines, but I'm new to LLVM, so I haven't figure out how to change it yet. If anyone has any pointers (other than RTFM, which I'm now doing), I'd be appreciative.''' (Continued...) -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:4> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Old description:
The following script reproduces the error easily:
{{{ N = 34363859232 p N p N.class
puts "%d" % N }}}
This results in the following exception:
{{{ 01.rb:5:in `%': integer 34363859232 too big to convert to `int' (RangeError) }}}
When run in ruby 1.9, the output shows that N is a Bignum. When run in macruby, the output shows that N is a fixnum. This appears to be due to bignum.c line 523:
{{{ if (len <= (sizeof(long)*CHAR_BIT)) { }}}
Apparently, in macruby, sizeof(long) is 8, which means bignums are only instantiated for integers larger than 64 bits or so, and in the above example, the integer is only 44 bits.
Even though the bignums are only created based on sizeof(long), the rb_str_format function in sprintf.c uses FIX2LONG for converting the VALUE to a long int, and FIX2LONG uses INT_MIN and INT_MAX to check the bounds of the integer.
I'm not sure what the safest way to fix this is. Should check_int (and others?) in numeric.c be comparing against larger boundary numbers, instead of INT_MIN/MAX?
New description: [[BR]] The following script reproduces the error easily: {{{ N = 34363859232 p N p N.class puts "%d" % N }}} This results in the following exception: {{{ 01.rb:5:in `%': integer 34363859232 too big to convert to `int' (RangeError) }}} When run in ruby 1.9, the output shows that N is a Bignum. When run in macruby, the output shows that N is a fixnum. This appears to be due to bignum.c line 523: {{{ if (len <= (sizeof(long)*CHAR_BIT)) { }}} Apparently, in macruby, sizeof(long) is 8, which means bignums are only instantiated for integers larger than 64 bits or so, and in the above example, the integer is only 44 bits. Even though the bignums are only created based on sizeof(long), the rb_str_format function in sprintf.c uses FIX2LONG for converting the VALUE to a long int, and FIX2LONG uses INT_MIN and INT_MAX to check the bounds of the integer. I'm not sure what the safest way to fix this is. Should check_int (and others?) in numeric.c be comparing against larger boundary numbers, instead of INT_MIN/MAX? -- Comment(by emoy@…): Part 3 of 3 While I was at it, I added %@ to be compatible with CF/Objective-C. I even added %a for hex float conversion. I noticed that marshal.c uses %g, plus a save_mantissa()/load_mantissa() routine; that should someday be replaced with just using %a for exact conversion. Once we have %a: {{{ % macruby -e 'puts "%a" % 1.23456789' 0x1.3c0ca4283de18p+0 }}} but: {{{ % macruby -e 'puts "0x1.3c0ca4283de18p+0".to_f' 0.0 }}} This is because util.c has an old version of David M. Gay's strtod that doesn't know about hex floats. But since Mac OS X has a recent version of Gay's strtod, if we just use the libSystem version (and save code space not having our own implementation), it now works: {{{ % macruby -e 'puts "0x1.3c0ca4283de18p+0".to_f' 1.23456789 }}} Even more: {{{ % macruby -e 'puts "inf".to_f' Infinity % macruby -e 'puts "nan".to_f' NaN }}} One other benefit of using the ruby sprintf code, is that it supports named hash arguments, like: {{{ % macruby -e 'x={"a"=>[1,"foo"],"b"=>2};puts "%<b>d %{a}" % x' 2 [1, "foo"] }}} So I attached my test program, as test-sprintf.rb and the diffs, as ticket159.diff. I've also create a SVN branch: http://svn.macosforge.org/repository/ruby/MacRuby/branches/ticket159 with the changes (made from yesterday's trunk). -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:5> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by emoy@…): Part 4 of 3 (oops) I suppose I should show that the original problem is really fix: {{{ % ./miniruby -e "p '%d' % (42**12)" "30129469486639681536" % ./miniruby -e "p '%#x' % (42**12)" "0x1a22160dd86211000" % ./miniruby -e "p '%#x' % -(42**12)" "0x..fe5dde9f2279def000" }}} which are the same results as ruby 1.9.1. -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:6> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by emoy@…): Further testing of my changes showed there was a random crasher, so I've been trying to track it down. Turns out to be a garbage collector problem that I fixed with just retaining the floating point object. But the code that was being generated was causing the optimization code in RoxorCompiler::compile_optimized_dispatch_call to not work, so I made some additional changes there, that caused a bug in llvm to surface (bug #5026, which was fixed in llvm-83656). I tried TOT LLVM, and verified that the bug went away, but other things weren't working, so I went back and changed the code yet again. I finally had a version that didn't tickle bug #5026, didn't cause a crash, and allowed the optimization code to work. The code (or at least the IR) being generated, no longer uses the %a representation to store floating point, but uses native LLVM floating point values. I've added a compile time option to generate more descriptive IR, with variables previous just showing as %12, now appear as %"[sel:inspect]" or %"[float:1.19999999999p0]". This helped be track down another crasher that was due to the same RoxorCompiler::compile_optimized_dispatch_call routine generating code that allowed the divisor to be zero and cause an (uncaught) arithmetic exception. So I added code to check the divisor being zero, and call a ruby exception. While running the spec tests many times, I saw some other problems with floating point handling, and fixed them. I also got an error due to ruby previously using an old, internal version of strtod that doesn't recognize "infinity" or "nan" and convert them to the corresponding floating point value. Since I changed macruby to use libSystem's strtod, it does know how to do the conversion, but one of the spec test specifically tests that "nan".to_f produces 0.0 instead of a real NaN. So I added a strict option to macruby (as well as an environment variable) that enforces more strictness, so "nan".to_f produces 0.0, and that %a in sprintf is no longer recognized. Then I noticed that even before my changes, the RoxorCompiler::compile_optimized_dispatch_call optimizations don't work under the AOT compiler, because it is storing literal values in LLVM GlobalVariables, and the code in RoxorCompiler::unbox_ruby_constant doesn't know how to deal with it. Since my changes to create more descriptive IR also uses GlobalVariables, I thought I could come up with a fix for both. I added code to detect the LoadInst, then the GlobalVariable that it referred to, then the actual value of the GlobalVariable. That works for JIT, but the AOT compiler doesn't store the real value in the GlobalVariable at this time, but later adds code to the top of the compiled code to load values into the GlobalVariables. So I add another std::map variable to RoxorCompiler that could map GlobalVariables to the real values (either for JIT or AOT) and now the optimization in RoxorCompiler::compile_optimized_dispatch_call works for both. I got sidetracked, and it is late, so it will have to wait until tomorrow for me to check in my changes to the branch and post a diff file as well.... -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:7> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by emoy@…): OK, I have checked in my changes to the ticket159 branch, and attached ticket159-incremental-2.diff and ticket159-full-2.diff. Besides the above mentioned changes, I have also added a fix to avoid temp file collisions by bin/rubyc (was building two versions of MacRuby at the same time, one with the IR debugging changes and one without, and one failed, due to temp file name collisions). As for enabling the IR debugging changes, they are controlled by the DEBUG_IR macro. I just hacked -DDEBUG_IR into rakelib/builder.rb, but I should patch it more formally to support command-line and environment variable setting. I am also working on a debug mode to disable (gcc) compiler optimization and inlining, so gdb works better. -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:8> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by lsansonetti@…): Hi Ed, sorry for the late reply on this. I had a quick look at the changes you made in the branch. I can see that you fixed several problems there, not only #159. We can definitely cherry- pick some of the changes and merge them into trunk. Regarding the float changes, I will have to read more about what you did, I'm afraid it would break some of the performance optimizations we wrote earlier. -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:9> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by emoy@…): I believe I managed to maintain the optimizations, even under DEBUG_IR (though there is additional overhead with new GlobalVariables, but not unlike the output of the AOT compiler). At least those that stem from unbox_ruby_constant(). If there are other optimizations I missed, I can certainly take a look. -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:10> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by pthomson@…): Hi, Ed! Firstly, let me say that overall your changes seem really good. The bugfixes, descriptive IR patch, and changes to sprintf (especially the addition of %@) are really awesome. As far as printing out the Ruby class rather than the ObjC class (pola.diff), I don't really think that's necessary - actually, I kind of like how it prints out ObjC classes, as it's a good demonstration to skeptics that string literals are NSMutableStrings. Laurent, do you have any opinions on this? Are there any Rubyspecs that clarify this issue? The strictness option is kind of cool, but I'm not sure whether or not it's a big enough issue to require the introduction of another environment variable. Thoughts? I am slightly concerned about forcing 32-bit to use RFloat structs instead of FixFloats - yes, there is a huge loss in precision, but the speed boost we get from not having to call malloc() every time we need to create a new Float is really compelling. Perhaps we could have a command-line flag (something like {{{--force-fixfloats}}}) on 32-bit so that applications that need speed over precision (games, mobile apps) can get speed boosts when necessary. But for the time being I think it's worth integrating these FP changes, as most people are on 64-bit. Additionally, this would fix the problem we're having with the non-representable nature of negative powers of 10, which is pretty embarrassing: {{{ p 0.1 # prints 0.0999999940395355 }}} (I've been thinking about whether we could use some sort of slab- allocation technique for Floats. Any ideas?) Anyway, your changes look really good. I need to talk with Laurent about the best way to fix some merge conflicts in dispatcher.cpp and compiler.h, but other than that I think we should be able to merge these changes soon (probably within a couple of days). -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:11> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: new Priority: major | Milestone: Component: MacRuby | Keywords: ---------------------------------+------------------------------------------ Comment(by lsansonetti@…): Looking at the changes, I have the following comments: * The "strict" code paths should be enabled by default, because we must be compatible with the upstream implementation of Ruby. At the very least, "nan".to_f should (unfortunately) return 0.0, like Ruby 1.9. I believe the %a and %A string formatters can be callable by default, since these are additions and do not break existing behavior. * In RoxorCompiler::compile_conversion_to_ruby(), the optimization that was inlining the code to convert a C float into a Ruby float disappeared. It is now compiling a call to rb_vm_float_to_rval() every time. This isn't good performance-wise, we must bring the inline optimization back. * On some simple benchmarks (such as a "i=0.0; while i<100000000.0; i+=1; end" loop), the new code is about 3 times slower. It's because the compiler generate C calls when creating floats, according to Shark, such as rb_vm_float_value(), rb_float_new_retained(), and rb_vm_effective_immediate_bits() (which should be inlined). Also, probably because of rb_float_new_retained(), boxed floats are apparently never released, which leads to memory leaks (a simple loop demonstrated that). I am attaching a patch of your changes against MacRuby trunk (as of r3193) so that others can try them. But at this point I'm not confident to merge all of it. -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:12> MacRuby <http://macruby.org/>
#159: "%d" with large integer argument gives conversion error ---------------------------------+------------------------------------------ Reporter: jamis@… | Owner: lsansonetti@… Type: defect | Status: closed Priority: major | Milestone: MacRuby 0.6 Component: MacRuby | Resolution: fixed Keywords: | ---------------------------------+------------------------------------------ Changes (by lsansonetti@…): * status: new => closed * resolution: => fixed * milestone: => MacRuby 0.6 Comment: In the meantime, this bug seems to be fixed as of r3487 (which does more or less what Ed did, a new sprintf implementation). {{{ $ ./miniruby -e "p '%d' % 100000000000000000000000000000000000000" "100000000000000000000000000000000000000" }}} -- Ticket URL: <http://www.macruby.org/trac/ticket/159#comment:13> MacRuby <http://macruby.org/>
participants (1)
-
MacRuby