MacRuby... suitable for my project?
I have a data aggregation project. It's fairly sizable; wants to use several GB of RAM (easily consuming 16GB or more during processing). Additionally, it would benefit from using multiple cpu cores. I'm wondering if MacRuby as it stands now could overcome some limitations with MRI, and hoping this is the right place to ask whether MacRuby makes sense for the project... Relative to the RAM, I'm working on some techniques to make the whole thing work in a smaller footprint, but there's some drawbacks. There's enough machines involved here that paying for programming is a viable alternative to just paying for lots of RAM. One of the problems I have with the MRI is rather slow performance when reconstituting serialized data. I was hoping I could pre-process a bunch of data, serialize it, store to disk, and recall it as needed. One example is 11MB of text data that I load and reorganize into an array of hashes (45,000 rows, about 20 fields). Marshaled to disk that's about 11Mb of data as well. It takes several seconds (depending on platform) to rebuild that. Also reloading that, MRI consumes vastly more RAM than just the 11MB. I'm wondering if MacRuby would perform any better? That file is one of hundreds ranging from 1MB to 120MB (with 20-30MB being average) that would be loaded during processing, and many of them need to be resident in RAM at the same time along with the WIP data structures. In a perfect world, I would like to have one large read-only data pool in RAM, and multiple "tasks" (we'll be generic for now) reading that data to assemble it's own new data set for output that does not need to be shared among tasks. A final process would stitch the results together. I either need something that will handle VM efficienctly so I can pretend I have "unlimited" RAM, or something that will perform better if I manually manage Marshaling. And I need something that does OS native threads well. I'm about 1/2 way done with the plain Ruby version of the code in 1.8.6. Haven't tried 1.9 yet, so I know I would have to start with that. I'm wondering if using MacRuby, and creating a very simple Cocoa app that functions more or less as VM container for the Ruby code would provide: a) better general performance (one would assume so) b) better performance relative to garbage collection, VM paging, etc c) access to better multi-threaded capabilities -- ability for multiple threads to use common read-only data pool in RAM and build their own data sets. Personally, I'm not a Cocoa dev (yet... it's on my list), but I have access to one, and of course would go ahead and learn what is needed in order to do this. I know this isn't really a main stream application for what MacRuby is wanting to acheive, but does it "make sense" to use MacRuby in this way to agin access to performance capabilities not in MRI ? Essentially a MacOS specific replacement for JRuby. Thoughts? Many thanks for your time. -- greg willits
Just so you know, MacRuby is not production-stable (yet). If you need something for production I don't think MacRuby is a good idea right now. That said, I wanted to respond to this part of your message: On Jul 30, 2008, at 8:08 PM, Greg Willits wrote:
One of the problems I have with the MRI is rather slow performance when reconstituting serialized data. I was hoping I could pre-process a bunch of data, serialize it, store to disk, and recall it as needed.
One example is 11MB of text data that I load and reorganize into an array of hashes (45,000 rows, about 20 fields). Marshaled to disk that's about 11Mb of data as well. It takes several seconds (depending on platform) to rebuild that. Also reloading that, MRI consumes vastly more RAM than just the 11MB. I'm wondering if MacRuby would perform any better? That file is one of hundreds ranging from 1MB to 120MB (with 20-30MB being average) that would be loaded during processing, and many of them need to be resident in RAM at the same time along with the WIP data structures.
Having done a lot of this type of stuff I would recommend that you don't Marshal your data but "serialize to source" I usually have to_ruby methods on my objects that write out source that would reconstitute them. I found that its WAY faster (like 10x) to parse in generated source then Marshal.load. If you want to talk more about this feel free to email me directly off-list. Best, Rich
participants (2)
-
Greg Willits
-
Richard Kilmer