[MacRuby-devel] MacRuby... suitable for my project?

Wed Jul 30 17:08:01 PDT 2008

I have a data aggregation project. It's fairly sizable; wants to use  
several GB of RAM (easily consuming 16GB or more during processing).  
Additionally, it would benefit from using multiple cpu cores.

I'm wondering if MacRuby as it stands now could overcome some  
limitations with MRI, and hoping this is the right place to ask  
whether MacRuby makes sense for the project...

Relative to the RAM, I'm working on some techniques to make the whole  
thing work in a smaller footprint, but there's some drawbacks.  
There's enough machines involved here that paying for programming is  
a viable alternative to just paying for lots of RAM.

One of the problems I have with the MRI is rather slow performance  
when reconstituting serialized data. I was hoping I could pre-process  
a bunch of data, serialize it, store to disk, and recall it as needed.

One example is 11MB of text data that I load and reorganize into an  
array of hashes (45,000 rows, about 20 fields). Marshaled to disk  
that's about 11Mb of data as well. It takes several seconds  
(depending on platform) to rebuild that. Also reloading that, MRI  
consumes vastly more RAM than just the 11MB. I'm wondering if MacRuby  
would perform any better? That file is one of hundreds ranging from  
1MB to 120MB (with 20-30MB being average) that would be loaded during  
processing, and many of them need to be resident in RAM at the same  
time along with the WIP data structures.

In a perfect world, I would like to have one large read-only data  
pool in RAM, and multiple "tasks" (we'll be generic for now) reading  
that data to assemble it's own new data set for output that does not  
need to be shared among tasks. A final process would stitch the  
results together.

I either need something that will handle VM efficienctly so I can  
pretend I have "unlimited" RAM, or something that will perform better  
if I manually manage Marshaling. And I need something that does OS  
native threads well.

I'm about 1/2 way done with the plain Ruby version of the code in  
1.8.6. Haven't tried 1.9 yet, so I know I would have to start with that.

I'm wondering if using MacRuby, and creating a very simple Cocoa app  
that functions more or less as VM container for the Ruby code would  
provide:

a) better general performance (one would assume so)
b) better performance relative to garbage collection, VM paging, etc
c) access to better multi-threaded  capabilities -- ability for  
multiple threads to use common read-only data pool in RAM and build  
their own data sets.

Personally, I'm not a Cocoa dev (yet... it's on my list), but I have  
access to one, and of course would go ahead and learn what is needed  
in order to do this.

I know this isn't really a main stream application for what MacRuby  
is wanting to acheive, but does it "make sense" to use MacRuby in  
this way to agin access to performance capabilities not in MRI ?  
Essentially a MacOS specific replacement for JRuby.

Thoughts?

Many thanks for your time.

-- greg willits