Re: [squirrelfish] JavaScriptCore renames

15 Nov 2008

      On Nov 15, 2008, at 12:07 PM, Geoffrey Garen wrote:
...
...
That being said, an overview of the renames planned would be helpful.
Okeedokee.
After much discussion, we concluded that almost all names in the  
field of virtual machines are potentially problematic, because each  
name can mean different things in different contexts, and each name  
usually fails to say precisely what it means. For example, "JIT",  
which technically just stands for "Just In Time", could refer to any  
of a million different processes on a computer, including the "just  
in time" initialization of a data member in a class. Even if we  
allow that "JIT" implicitly refers to CPU-specific translation at  
runtime, making it less vague, "just in time" is still an inaccurate  
phrase, since function-at-a-time translation will translate unlikely  
basic blocks well before they execute -- if they execute at all.  
Even the phrase "virtual machine" is problematic, since the CPU  
itself is a kind of virtual machine. It's turtles all the way down.
However, none of this changes the fact that we use certain names as  
terms of art  all the time. So, we decided to shun the siren song of  
perfect, technically accurate naming, and settle on names that  
reflected our basic everyday thinking and speech.
Therefore:
* "Bytecode"
Anything in JavaScript currently called "byte code" "opcode" "op  
code" "code" "bitcode" etc. we'll rename to "bytecode". We use one  
word with no camel case to indicate that the word is just jargon,  
and not a true compound word. It's not really one byte, but we're  
over that. We use this word every day when talking about our code,  
so we'll use it in our code, too.
This sounds good in general, although as I pointed out on IRC and will  
repeat for benefit of others here, I don't think "bytecode" and  
"opcode" are really synonyms. My understanding of the word would be:

- bytecode is a mass noun, like "machine code" - you can't have "a  
bytecode"
- a singular unit of bytecode is an "instruction", or "bytecode  
instruction" if you must
- the part of the instruction that says operation to perform, rather  
than what the operands are, is an "opcode"
...
* "BytecodeGenerator"
The class used to generate bytecode. "Generator" clearly indicates  
that the class outputs bytecode, whereas a name like "compiler"  
might mean that the class outputs bytecode, or it might mean that  
the class takes bytecode as its input. Also, we thought that names  
like "compiler" implied a larger suite of tools not included in this  
class.
I do believe that the term Bytecompiler specifically refers to a  
compiler that outputs bytecode, and can never refer to a compiler that  
takes bytecode as input instead. It's a little shorter. But  
technically our bytecompiler encompasses not just the  
BytecodeGenerator class but also all the emit functions in Nodes.cpp,  
so I'd probably use it for a directory, not a class.
...
* "BytecodeInterpreter"
The class that executes a program in bytecode form. We liked the  
symmetry with "BytecodeGenerator". We rejected names like BytecodeVM  
because we thought the name "virtual machine" was a little too  
vague, and it implied a larger suite of functionality not limited to  
this class.
* "JIT"
The class that translates a program in bytecode form to CPU-specific  
code. We rejected "BytecodeJIT" because we couldn't tell if a  
BytecodeJIT had bytecode as its input or its output. It's not  
symmetric with BytecodeInterpreter, but oh well. We liked "JIT"  
because we thought that interpreter vs JIT was a widely used and  
understood dichotomy.
I like all of these.
...
So we have this directory structure:
bytecode
  -> generator
  -> interpreter
  -> jit
  -> sampler
I'm not sure I like having a lot of subdirectories under bytecode  
though, particularly since they will each contain so few files. I'd  
propose:

- bytecodegenerator or bytecompiler at top level (Bytecompiler is a  
slightly more concise term of art for a compiler that outputs  
bytecode, with no ambiguity about whether the bytecode is going in or  
out)
- a bytecode directory at top level containing general bytecode data  
structures and the bytecode interpreter
- a jit directory at top level
- sampler stuff relegated to one of the above

That's more in line with the directory structure we all discussed  
before, and which we've barely had a chance to get used to.
...
It bears mentioning that JavaScriptCore also contains a bytecode and  
a JIT for regular expressions, so the names above might be vague. We  
decided that the best solution was to treat regular expression  
functionality as secondary, giving classes related to it an extra  
prefix, or a different namespace. We also decided not to bother  
changing the nomenclature in PCRE, because PCRE is frozen in time.  
Also, PCRE hurts me in the brain.
Many other small renames and file splittings are included in my  
patch, but they all tend to follow from these. The only substantial  
one I can think of is "__". Right now we have code like this:
m_jit.movl_i32r(...)
In this context, "m_jit" is a data member of "JIT", and an instance  
of the "X86Assembler" class. JIT::m_jit is weird, and calling an  
assembler a JIT is also weird. So, we opted to renamed "m_jit" to  
"m_assembler", and then for brevity, use a macro to replace  
"m_assembler." with "__", so you get this:
__ movl_i32r(...)
There are a few cases where this looks a little weird right now, but  
they're fixable. In general, this approach has worked well for other  
projects, so it will probably work well for us.
I'm not a huge fan of this but I must admit m_jit would be wrong and  
saying m_assembler all the time would strain readability.

Regards,
Maciej

Re: [squirrelfish] JavaScriptCore renames

Maciej Stachowiak