On Nov 15, 2008, at 12:07 PM, Geoffrey Garen wrote:
That being said, an overview of the renames planned would be helpful.
Okeedokee.
After much discussion, we concluded that almost all names in the field of virtual machines are potentially problematic, because each name can mean different things in different contexts, and each name usually fails to say precisely what it means. For example, "JIT", which technically just stands for "Just In Time", could refer to any of a million different processes on a computer, including the "just in time" initialization of a data member in a class. Even if we allow that "JIT" implicitly refers to CPU-specific translation at runtime, making it less vague, "just in time" is still an inaccurate phrase, since function-at-a-time translation will translate unlikely basic blocks well before they execute -- if they execute at all. Even the phrase "virtual machine" is problematic, since the CPU itself is a kind of virtual machine. It's turtles all the way down.
However, none of this changes the fact that we use certain names as terms of art all the time. So, we decided to shun the siren song of perfect, technically accurate naming, and settle on names that reflected our basic everyday thinking and speech.
Therefore:
* "Bytecode"
Anything in JavaScript currently called "byte code" "opcode" "op code" "code" "bitcode" etc. we'll rename to "bytecode". We use one word with no camel case to indicate that the word is just jargon, and not a true compound word. It's not really one byte, but we're over that. We use this word every day when talking about our code, so we'll use it in our code, too.
This sounds good in general, although as I pointed out on IRC and will repeat for benefit of others here, I don't think "bytecode" and "opcode" are really synonyms. My understanding of the word would be: - bytecode is a mass noun, like "machine code" - you can't have "a bytecode" - a singular unit of bytecode is an "instruction", or "bytecode instruction" if you must - the part of the instruction that says operation to perform, rather than what the operands are, is an "opcode"
* "BytecodeGenerator"
The class used to generate bytecode. "Generator" clearly indicates that the class outputs bytecode, whereas a name like "compiler" might mean that the class outputs bytecode, or it might mean that the class takes bytecode as its input. Also, we thought that names like "compiler" implied a larger suite of tools not included in this class.
I do believe that the term Bytecompiler specifically refers to a compiler that outputs bytecode, and can never refer to a compiler that takes bytecode as input instead. It's a little shorter. But technically our bytecompiler encompasses not just the BytecodeGenerator class but also all the emit functions in Nodes.cpp, so I'd probably use it for a directory, not a class.
* "BytecodeInterpreter"
The class that executes a program in bytecode form. We liked the symmetry with "BytecodeGenerator". We rejected names like BytecodeVM because we thought the name "virtual machine" was a little too vague, and it implied a larger suite of functionality not limited to this class.
* "JIT"
The class that translates a program in bytecode form to CPU-specific code. We rejected "BytecodeJIT" because we couldn't tell if a BytecodeJIT had bytecode as its input or its output. It's not symmetric with BytecodeInterpreter, but oh well. We liked "JIT" because we thought that interpreter vs JIT was a widely used and understood dichotomy.
I like all of these.
So we have this directory structure:
bytecode -> generator -> interpreter -> jit -> sampler
I'm not sure I like having a lot of subdirectories under bytecode though, particularly since they will each contain so few files. I'd propose: - bytecodegenerator or bytecompiler at top level (Bytecompiler is a slightly more concise term of art for a compiler that outputs bytecode, with no ambiguity about whether the bytecode is going in or out) - a bytecode directory at top level containing general bytecode data structures and the bytecode interpreter - a jit directory at top level - sampler stuff relegated to one of the above That's more in line with the directory structure we all discussed before, and which we've barely had a chance to get used to.
It bears mentioning that JavaScriptCore also contains a bytecode and a JIT for regular expressions, so the names above might be vague. We decided that the best solution was to treat regular expression functionality as secondary, giving classes related to it an extra prefix, or a different namespace. We also decided not to bother changing the nomenclature in PCRE, because PCRE is frozen in time. Also, PCRE hurts me in the brain.
Many other small renames and file splittings are included in my patch, but they all tend to follow from these. The only substantial one I can think of is "__". Right now we have code like this:
m_jit.movl_i32r(...)
In this context, "m_jit" is a data member of "JIT", and an instance of the "X86Assembler" class. JIT::m_jit is weird, and calling an assembler a JIT is also weird. So, we opted to renamed "m_jit" to "m_assembler", and then for brevity, use a macro to replace "m_assembler." with "__", so you get this:
__ movl_i32r(...)
There are a few cases where this looks a little weird right now, but they're fixable. In general, this approach has worked well for other projects, so it will probably work well for us.
I'm not a huge fan of this but I must admit m_jit would be wrong and saying m_assembler all the time would strain readability. Regards, Maciej