Prototype Based Programming
Prototype-based programming is a style of object-oriented programming in which classes are not explicitly defined, but rather derived by adding properties and methods to an instance of another class or, less frequently, adding them to an empty object. To understand this better we must see the contrast between Prototype based languages and Class based languages
Object oriented languages like C++ are founded on two concepts viz.
- A class is an abstract representation of an object. It is like a template that defines the behaviour as well as the properties of an object.
- An instance is an instantiation or a member of a class. It is an object in memory.
Object, a typical object inherits properties (including methods) from
Object.prototype. This allows the creation of an object without first defining its class.
In addition, any object can be associated as the prototype for another object, allowing the second object to share the first object's properties, this also means that if you add a property to an object that is used as the prototype for a set of objects, the objects for which it is the prototype also get the new property.
- Parsing the code to generate AST
- Compiling the parsed code (usually done by a baseline and an optimising compiler)
Most of what you will read next is in context to V8, however it is not very different for the other engines.
Broadly speaking parsing involves two steps lexical analysis and syntax analysis. Lexical analysis involves reading a stream of characters from our code and combine them into tokens, it also involves removal of white space characters, comments, etc. In the end, the entire string of code will be split into a list of tokens. Syntax analyser, also called parser, will take a plain list of tokens after lexical analysis and turn it into a tree representation, and also validates the language syntax.
The following is the result of both operations for a simple function.
V8 had two separate parsers (currently only one, explained later) with slightly different purposes, they are Parser and PreParser, Parser is the full eager one which is responsible for building the AST and scopes as well as finding syntax errors. The PreParser is the lazy one, and obviously the faster one (Twice as fast ⚡️). This is required because a lot of web pages ship a lot of code they don't execute. PreParser does not build an AST, even though it builds scopes but it doesn't put a lot of references or declarations in it. It basically skips over the functions we don't wish to compile right now. How does it know that? There are a few simple rules, all top level code, Immediately Invoked Function Expressions (IIFEs) or any functions recognised as IIFEs are eager executed, other top level functions that are not IIFEs are skipped over, and noted by PreParser, and are eager parsed later when the function is called.
You can add an exclamation mark before a function to inform the parser that you want to eager parse it.
Now that we have an AST and the scope ready, it's turn for the interpreter to take over, V8 has Ignition that generates bytecode from the syntax tree.
Ignition is a register machine or a stack machine with an accumulator register, that generates and interprets bytecode using the AST provided by the parser. Bytecode is an abstraction of machine code.
Previously when a script loads in your browser and the engine decides to parse and compile it, first thing that it needs to do is run the top level code of the script, so for this the full-codegen compiles that block of code and it tries to do it as fast as it can. Obviously the lazy parsing tries to reduce the amount of the work it had to do by letting it skip through the code that need not be compiled right away, but the lazy stub is still waiting to be parsed by the Parser and compiled when it's called, so we effectively parse the function twice, once by the lazy parser and secondly when it's called. That's just partly the problem.
Now consider the following code
var Student = function() and
() is lazy parsed initially and when the class is instantiated and
doWork is called the function body is parsed and compiled. But the function
Using an interpreter very easily solves a portion of the memory problem, since the footprint of the interpreted bytecode is quite less as compared to the machine code. Also this small footprint means there is less parsing overhead which allows use to parse the entire script in an eager fashion. No need for a PreParser! Yayyy!
This also reduces the complexity, since the AST generated is for the entire of the script and not just pieces of it, the bytecode generated from this AST can be considered as the source of truth.
Sea of Nodes
All computations are expressed as nodes in the sea of nodes and the edges represent dependencies between computations. This approach allows better performing JIT complied code. Sea of Nodes is based on SSA or Single Static Assignment. This is a way of structuring the intermediate representation of a code block/program so that every variable is assigned exactly once. This is useful is redundancy elimination.
Some common ways of redundancy elimination are value numbering, conditional constant propagation, common-subexpression elimination (CSE), partial-redundancy elimination. Each come with their own suite of benefits as well as problems.
Static single-assignment form represents use-def information explicitly and arranges for every value computed by a program to have a unique assignment/definition. A method is in SSA form if every variable has (statically) exactly one definition.
So for a Sea of Nodes, each SSA data value is represented as a node in a graph. A node produces a value. (eg, Add 3 + 6). A node points to its operands (the constants 3 and 6). There is no other extra data
Consider the following code:
He we know that in the case of
In this particular case, we collect information about the input operands and the resulting value of the + operation (the
Add bytecode). When we optimize this code with TurboFan and we’ve seen only numbers so far, we put checks in place to check that both
y are numbers (in that case we know that the result is going to be a number as well). If either of these checks fail we go back to interpreting the bytecode instead — a process called Deoptimization. Thus TurboFan doesn’t need to worry about all these other cases of the
+ operator and doesn’t even need to emit machine code to handle those, but can focus on the case for numbers, which translates well to machine instructions.
The Execution Pipeline
This diagram gives a simplified (over-simplified) representation of the complete JS Execution pipeline. There is much more to be read beyond the things explained here. Be sure to follow the v8 team to learn more about the project and how it all works.
P.S. Most developers need not worry about choosing the best algorithms and data structures, and instead can focus on the application design. You can admire the engineering though.