On Sep 29, 2008, at 7:21 PM, Maciej Stachowiak wrote:
StructureID:
It's more than just an ID now. Besides the renaming, it should perhaps
be merged with PropertyMap since there isn't as clearly a separate
role. As for names:
- Structure
Pro: "these two objects have the same structure" sounds good
Con:
I'd vote for this; I'd say that the object in question is analogous in its core purpose to a C struct, so the name Structure fits nicely.
VM/CTI => jit/NativeCodeGeneratorX86 or jit/MachineCodeGeneratorX86
masm/X86Assembler => jit/AssemblerX86
I'm pretty sure this is good, but "Machine" or "Native"? I like
"Native" somewhat better, now that we use "JS" and "Host" to
distinguish between things originally in JS and originally in C++.
I'm definitely biased to liking 'Native' too, since it syncs up with terminology I've worked with in the past.
Thought: will we end up with a platform-independent core for
NativeCodeGenerator or will it have to be rewritten from scratch for
every CPU architecture? We could probably make more of the code and
logic reusable by having helper functions that emit the code for high-
level operations, similar to things like emitJumpSlowCaseIfNotJSCell,
or even larger chunks like compileBinaryArithOp.
Yes, agreed.
Sorry, this is going to be a bit of a ramble.
My hope has been that we can make the bulk of the code in the NativeCodeGenerator platform independent. My plan for the first steps towards this would to introduce a layer of abstraction between the code generator and the assembler, akin to an abstract macro assembly language. For example, the macro assembly may provide operations such as 'compare register to immediate void* and branch if not equal', or add immediate integer value to U32 in register – where on different architectures these calls may result in differing numbers of instruction being emitted – e.g. on a more RISC architecture it may be necessary to load or construct immediate values into registers, where on a CISC chip it may be possible to use an operation with an immediate operand.
So we'd end up with (on two different platforms) a set of classes something like:
NativeCodeGenertor -> MasmX86 -> X86Assembler
NativeCodeGenertor -> MasmArm -> ArmAssembler
My expectation is that initially at least this Masm would be a fairly trivial wrapper on a platform specific assembler, and would be separately implemented for each platform. It may make sense to move some additional work into the macro assembler, e.g. when JITting for an ARM with an in-order pipe it may make sense to have some limited ability to peep-hole schedule instructions here.
Introducing this abstraction seems likely to be a necessary first step to get past the most obvious initial hurdles – e.g. CTI is riddled with places we reinterpret_cast pointers to 'unsigned', which isn't going to fly on x86-64 – but I'm guessing this won't be the whole solution.
I can imagine that we'll face a range of issues from the very micro-architectural (e.g. dependencies on flag state between operations, code scheduling differences due to limited register availability, and 'quirks' in the requirements of particular operations on given architectures – e.g. specific register requirements) – to much more macro issues based on higher-level design decisions we take on different platforms (e.g. we may want to make use of differing internal implementations of JSImmediate on different platforms). More immediately there will be a decision of where the differences in ABI in function call setup are captured, and in the longer run we'll also have to decide where we want register allocation to live.
The best place to capture some of these differences may be in the Masm, may be in the code generator, or may be in a additional delegation layer between the two. As an ideal I would like to keep all knowledge of JSC concepts of such as JSImmediate and JSCell out of the Masm, so that it can provide an entirely abstract interface (potentially applicable to any code generation task). I'd also like to try to keep artifacts specific to the current code generation strategy (e.g. slow-cases) back out in the code generator rather than in the Masm, so the Masm is immediately reusable if we want to explore different code generation strategies (as one example, in case we wanted to explore something more akin to trace trees). In practice we may find it necessary to blur these lines.
I'm rambling on far to much here, but I hope this all made some kind of sense.
My plan of action is to start abstracting out the assembler and ABI as one unit of work, with a goal that we will then be able to bring up support for a second native platform with a reasonably contained set of changes (implementing the Assembler, the Masm, and then hopefully this leave a much reduced set of changes to the code generation).
cheers,
G.