OpenCL experiment (request for comments)
Hi @all! I promised Laurent I would start a discussion on OpenCL and Macruby on the ML, and here I am :) I wrote a small hack for MacRuby that adds basic (and hacky) support for OpenCL kernels running on your GPUs or CPUs. You can read about it on my blog post here http://blog.0x82.com/2010/1/23/opencl-in-macruby-hack-not-very-useful The branch with the code is located here on github http://github.com/rubenfonseca/macruby. If you are interested on the actual hacky implementation, please look at “opencl.c” file. Now there are a couple of things I need help. It was my very first (mac)ruby C extension, and I’m not really sure about many details on the implementation (I basically copy-pasted code from other modules eheh). I’ll raise a couple of questions here, I’m sure you can answer some of them :) - How and were to store primitive values? For instance, OpenCL::Device has somewhere inside a “cl_device_id” pointer. However, on other classes (think OpenCL::Context#new) I’ll need to have a reference to that “cl_device_id” pointer somewhere. What’s the best way to store the pointer? Inside the Object struct? As an instance var? As an accessor? These later options doesn’t make sense to me, 'cause I’m never interested on getting the “cl_device_id” pointer on a IRB shell for instance... Hope I’m making myself clear. - How the memory should be managed? As I said, I never wrote a MacRuby extension before. When writing the extension, I needed to do a couple of memory allocations. I used “xmalloc” (discovered by looking at other macurby *.c files). However, when I called “free” after I don’t need the memory anymore, all sorts of warnings happened at runtime. After I deleted the “free” calls, it all worked, but I’m not sure if I’m leaking memory somehow. On the other hand, maybe the memory is automatically GC’ed :) Can you clear this for me and show me the best practices? - How to turn the OpenCL API more “Rubyish" I have no clue on this one. OpenCL seems like a huge API, and I don’t have a really background knowledge of GPU programming. There are all sorts of variations on each call, and a number of different entities (classes). Any suggestion on how to make this more pleasant to write in Ruby would be very very welcome. I even saw Laurent talk about a “Ruby -> OpenCL direct compiler via LLVM bitcode”, but I’m definitely not qualified to even consider that hipotesys. I tried twice creating a very simple compiler with LLVM and failed completely :P Anyhow, sorry for the long email, but I would be very thankful for any help I can get :) Cheers, Rúben Fonseca
Hi Ruben, Sorry for the late response! On Jan 26, 2010, at 4:26 AM, Ruben Fonseca wrote:
Hi @all!
I promised Laurent I would start a discussion on OpenCL and Macruby on the ML, and here I am :)
I wrote a small hack for MacRuby that adds basic (and hacky) support for OpenCL kernels running on your GPUs or CPUs. You can read about it on my blog post here http://blog.0x82.com/2010/1/23/opencl-in-macruby-hack-not-very-useful
The branch with the code is located here on github http://github.com/rubenfonseca/macruby . If you are interested on the actual hacky implementation, please look at “opencl.c” file.
Now there are a couple of things I need help. It was my very first (mac)ruby C extension, and I’m not really sure about many details on the implementation (I basically copy-pasted code from other modules eheh). I’ll raise a couple of questions here, I’m sure you can answer some of them :)
- How and were to store primitive values?
For instance, OpenCL::Device has somewhere inside a “cl_device_id” pointer. However, on other classes (think OpenCL::Context#new) I’ll need to have a reference to that “cl_device_id” pointer somewhere.
What’s the best way to store the pointer? Inside the Object struct? As an instance var? As an accessor? These later options doesn’t make sense to me, 'cause I’m never interested on getting the “cl_device_id” pointer on a IRB shell for instance... Hope I’m making myself clear.
The best way is to use the RData structure, as you would do with the upstream Ruby implementation. For this, you are using the following macros: Data_Wrap_Struct Data_Make_Struct Data_Get_Struct A few notes: 1) The mark callback will not be honored (our GC does not use it) 2) The free callback will not be honored too (this is a current limitation). In order to free resources upon GC cycles, one must implement a -finalize method on the class. You can grep the MacRuby source code for "rb_objc_install_method2" calls using the "finalize" selector as an example. 3) If you decide to store inside the RData structure a C structure allocated with Ruby allocated memory (xmallos & friend), you must be careful to appropriately use write barriers when setting Ruby objects inside that structure. But I believe you should not need to do this. However, in the next release (0.6) we intend to fully support the upstream Ruby C interface. The work already started a little bit.
- How the memory should be managed?
As I said, I never wrote a MacRuby extension before. When writing the extension, I needed to do a couple of memory allocations. I used “xmalloc” (discovered by looking at other macurby *.c files). However, when I called “free” after I don’t need the memory anymore, all sorts of warnings happened at runtime.
After I deleted the “free” calls, it all worked, but I’m not sure if I’m leaking memory somehow. On the other hand, maybe the memory is automatically GC’ed :) Can you clear this for me and show me the best practices?
When using xmalloc(), free() should not be used. xfree() is the appropriate free method, but you don't need to call it. The collector will collect garbage anyway.
- How to turn the OpenCL API more “Rubyish"
I have no clue on this one. OpenCL seems like a huge API, and I don’t have a really background knowledge of GPU programming. There are all sorts of variations on each call, and a number of different entities (classes). Any suggestion on how to make this more pleasant to write in Ruby would be very very welcome.
I will have a better look when I have some time, but may I suggest the following: 1) Try to wrap as much as the OpenCL API as possible in this C module. 2) Write higher-level APIs/paradigms in a pure Ruby file that would ship with the standard library. This is the method we are taking for GCD. Also, I heard there is an existing OpenCL wrapper for MRI, maybe it would be interesting to have a look at it to see how they do things :)
I even saw Laurent talk about a “Ruby -> OpenCL direct compiler via LLVM bitcode”, but I’m definitely not qualified to even consider that hipotesys. I tried twice creating a very simple compiler with LLVM and failed completely :P
So LLVM should be able to generate code for OpenCL, I believe. It would be awesome if in MacRuby you could just send a given block to the GPU, internally it would compile the block as LLVM IR (probably specialized for OpenCL) then JIT compile it and run it on the GPU. That would be a higher level abstraction and would probably give OpenCL on the hands of the average Ruby programmer, who knows little (if not nothing) about C-based code. Laurent
participants (2)
-
Laurent Sansonetti
-
Ruben Fonseca