[squirrelfish] JavaScriptCore renames

Sat Nov 15 13:23:57 PST 2008

On Nov 15, 2008, at 12:07 PM, Geoffrey Garen wrote:

>> That being said, an overview of the renames planned would be helpful.
>
> Okeedokee.
>
> After much discussion, we concluded that almost all names in the  
> field of virtual machines are potentially problematic, because each  
> name can mean different things in different contexts, and each name  
> usually fails to say precisely what it means. For example, "JIT",  
> which technically just stands for "Just In Time", could refer to any  
> of a million different processes on a computer, including the "just  
> in time" initialization of a data member in a class. Even if we  
> allow that "JIT" implicitly refers to CPU-specific translation at  
> runtime, making it less vague, "just in time" is still an inaccurate  
> phrase, since function-at-a-time translation will translate unlikely  
> basic blocks well before they execute -- if they execute at all.  
> Even the phrase "virtual machine" is problematic, since the CPU  
> itself is a kind of virtual machine. It's turtles all the way down.
>
> However, none of this changes the fact that we use certain names as  
> terms of art  all the time. So, we decided to shun the siren song of  
> perfect, technically accurate naming, and settle on names that  
> reflected our basic everyday thinking and speech.
>
> Therefore:
>
> * "Bytecode"
>
> Anything in JavaScript currently called "byte code" "opcode" "op  
> code" "code" "bitcode" etc. we'll rename to "bytecode". We use one  
> word with no camel case to indicate that the word is just jargon,  
> and not a true compound word. It's not really one byte, but we're  
> over that. We use this word every day when talking about our code,  
> so we'll use it in our code, too.

This sounds good in general, although as I pointed out on IRC and will  
repeat for benefit of others here, I don't think "bytecode" and  
"opcode" are really synonyms. My understanding of the word would be:

- bytecode is a mass noun, like "machine code" - you can't have "a  
bytecode"
- a singular unit of bytecode is an "instruction", or "bytecode  
instruction" if you must
- the part of the instruction that says operation to perform, rather  
than what the operands are, is an "opcode"

>
> * "BytecodeGenerator"
>
> The class used to generate bytecode. "Generator" clearly indicates  
> that the class outputs bytecode, whereas a name like "compiler"  
> might mean that the class outputs bytecode, or it might mean that  
> the class takes bytecode as its input. Also, we thought that names  
> like "compiler" implied a larger suite of tools not included in this  
> class.

I do believe that the term Bytecompiler specifically refers to a  
compiler that outputs bytecode, and can never refer to a compiler that  
takes bytecode as input instead. It's a little shorter. But  
technically our bytecompiler encompasses not just the  
BytecodeGenerator class but also all the emit functions in Nodes.cpp,  
so I'd probably use it for a directory, not a class.

>
>
> * "BytecodeInterpreter"
>
> The class that executes a program in bytecode form. We liked the  
> symmetry with "BytecodeGenerator". We rejected names like BytecodeVM  
> because we thought the name "virtual machine" was a little too  
> vague, and it implied a larger suite of functionality not limited to  
> this class.
>
> * "JIT"
>
> The class that translates a program in bytecode form to CPU-specific  
> code. We rejected "BytecodeJIT" because we couldn't tell if a  
> BytecodeJIT had bytecode as its input or its output. It's not  
> symmetric with BytecodeInterpreter, but oh well. We liked "JIT"  
> because we thought that interpreter vs JIT was a widely used and  
> understood dichotomy.

I like all of these.

> So we have this directory structure:
>
> bytecode
> 	-> generator
> 	-> interpreter
> 	-> jit
> 	-> sampler

I'm not sure I like having a lot of subdirectories under bytecode  
though, particularly since they will each contain so few files. I'd  
propose:

- bytecodegenerator or bytecompiler at top level (Bytecompiler is a  
slightly more concise term of art for a compiler that outputs  
bytecode, with no ambiguity about whether the bytecode is going in or  
out)
- a bytecode directory at top level containing general bytecode data  
structures and the bytecode interpreter
- a jit directory at top level
- sampler stuff relegated to one of the above

That's more in line with the directory structure we all discussed  
before, and which we've barely had a chance to get used to.

> It bears mentioning that JavaScriptCore also contains a bytecode and  
> a JIT for regular expressions, so the names above might be vague. We  
> decided that the best solution was to treat regular expression  
> functionality as secondary, giving classes related to it an extra  
> prefix, or a different namespace. We also decided not to bother  
> changing the nomenclature in PCRE, because PCRE is frozen in time.  
> Also, PCRE hurts me in the brain.
>
> Many other small renames and file splittings are included in my  
> patch, but they all tend to follow from these. The only substantial  
> one I can think of is "__". Right now we have code like this:
>
> m_jit.movl_i32r(...)
>
> In this context, "m_jit" is a data member of "JIT", and an instance  
> of the "X86Assembler" class. JIT::m_jit is weird, and calling an  
> assembler a JIT is also weird. So, we opted to renamed "m_jit" to  
> "m_assembler", and then for brevity, use a macro to replace  
> "m_assembler." with "__", so you get this:
>
> __ movl_i32r(...)
>
> There are a few cases where this looks a little weird right now, but  
> they're fixable. In general, this approach has worked well for other  
> projects, so it will probably work well for us.

I'm not a huge fan of this but I must admit m_jit would be wrong and  
saying m_assembler all the time would strain readability.

Regards,
Maciej