[MacRuby-devel] Fibers and Enumerators

Fri Aug 13 08:15:27 PDT 2010

On Aug 12, 2010, at 03:33 PM, macruby at djc.net wrote:
While I am certain everyone would agree that 100% compatibility should be the 
goal, I am not convinced how "critical" this addition is:
in particular, how difficult is it to do a quick code change?
Unlike basic data types, there are quite a few reasons I don't see Fibers 
being a large percentage of any code.

I hope that folks involved with this discussion don't misinterpret the reason I originally posted.  I'm not agitating, or pushing, or cajoling, or even suggesting that we _NEED_ Fibers and must have them ASAP.

Instead I was curious to know if there was "a plan" for implementing Fibers and if so, I was curious to know what operating system technology they would be built on.  It appears that the problem has not been looked at, in-depth, yet and I am satisfied with that.  Nor does it appear that there is any OS level technology that would particularly support the creation of Fibers (i.e. Mac OS X doesn't really have any built-in support for cooperatively scheduled multitasking... outside of deprecated technologies like the Carbon Thread Manager and makecontext/swapcontext).

It seems that Mac OS X has made a fairly clean break from the cooperative multitasking roots of the "classic" Mac OS.  In some ways that's a really Good Thing(tm) but I'm not convinced that it is true in all cases.  I'm working off the assumption (with all that implies) that Languages like Python, and now Ruby have technologies like Fibers for a reason.

Does anyone have any truly compelling examples where Fibers are truly the 
only/right way to go, vs. an alternate design?

I don't see them scaling well (see ruby-doc.org) - meaning, their advantage is 
restricted in scope.

I can't answer your question other than to suggest that the language feature must have been added for a reason. (and, Yes, that is a cheap cop-out of a response).

I can tell you the particular context that kicked off this whole line of inquiry for me. I will do so here, but be forewarned that it will be a long explanation and my pursuit is purely academic. :-)  That is to say, it's something I've been studying mostly for the sake of learning something new.  If that kind of thing doesn't interest you... if you're more of a practical applications kind of person, then you might want to skip the explanation.

What got me started on the topic was a data structure from a beginning computer science class called "Streams" (not I/O streams... but something different with a confusingly homogenous name).  In Clojure they are represented using a structure called a "lazy sequence".

Basically the idea is that you can express a computation as a series of streams and processing elements.  We see this all the time in Ruby where the stream is the sequence of objects yielded up by and enumerator, one processing element is a block, and another processing element is a method.  For example:

[1, 2, 3, 4, 5].collect { |x| x* x }

Looking at this from the Streams perspective, we are taking a Stream of numbers (the sequence of the elements in the array), passing each element through a processor (represented by the block) that computes the square of the elements fed into it (creating a stream of the squares), and then collecting all the elements of that stream into an array.

This technique gets interesting when you start adding the concept of infinitely long streams.  Consider a problem where you want to compute the first 10 prime numbers that are also palindrome number (i.e. which are "spelled" the same backward and forward).  If you had a stream of the whole numbers, a filter for picking out just the primes, and a filter for picking out just the palindrome numbers you could express the whole solution as:

whole_numbers.filtered_by(:prime).filtered_by(:palindrome).take(10)

whole_numbers is an "infinite stream" which we filter out only primes (creating an infinite stream of prime numbers), then filter out only palindrome numbers (an infinite stream of primes that are also palindrome numbers), of which we only take the first 10 elements.

There's a lot of implementation detail left out (and my syntax here is only designed to illustrate the concepts, not to suggest an implementation) but at the heart of this kind of system is the need to represent an infinite stream of the whole numbers.  I've seen this called a "Generator".  In Ruby 1.9 we can create a Generator for the whole numbers as:

whole_numbers = Enumerator.new do |yielder|
	number = 0;
	loop do
		yielder.yield number
		number = number + 1
	end
end

In other words, the sequence of whole numbers HAS to be a process, not a data structure, because it is infinite (and infinitely large data structures have a hard time squeezing into finite memory).  Because it is a process, though, we also need some means of interrupting the process to retrieve the elements from it.  (and in practice we don't want the process itself to run without pause because it would consume an infinite amount of processor time).

This lead me to an exploration of Enumerators, then an exploration of Fibers (which are used to implement enumerators) then to an exploration of cooperative multitasking in Mac OS X, and finally a quest to find out why the heck my Ruby 19 enumerators didn't work in MacRuby (the only Ruby 1.9 implementation worth running... right?!)

Whew...

Now, that's not to say that Fibers are the ONLY way to implement the Generator above.  One could do this, for example:

def whole_numbers
	number = 0
	return lamba { result = number; number += 1; result; }
end

And "my_proc = whole_numbers()" will give you a Proc that will generate the next item in an infinite sequence of each successive whole number each time it's called.  However coming up with this solution for the trivial example of the sequence of whole numbers is easy.  If the sequence you are trying to generate requires more complex calculations... coming up with a simple proc may not be as straightforward.

So to try and tie it back to your original question of "What are Fiber's good for".  In this case I have an process that represents an infinite loop (an application event loop would be another example).  I want to control that infinite loop so that it doesn't consume unlimited processing resources, and I want to retrieve intermediate values computed by that loop.  Cooperative multitasking (a lá Fibers) is one solution.  Breaking the processing parts of the loop into smaller chunks (the second solution above which might fit well in a GCD implementation) would be another solution.

Is the problem space significant enough that we need Fibers to implement it?  I can't answer that question. (My guess would be "no" because Ruby 1.8 had the Generator class that does something similar).  But it is "a solution" :-)

Scott

Finally, since this is an academic discussion... here's your bibliography:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-24.html#%_sec_3.5
http://clojure.org/sequences
http://www.infoq.com/news/2007/08/ruby-1-9-fibers
:-) http://lmgtfy.com/?q=Ruby+1.9+%22Generators%22+%22Enumerator%22
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/macruby-devel/attachments/20100813/8884988b/attachment.html>