Ejscript Architecture
This document describes the Ejscript Language Architecture and how it processes program scripts. It describes the flow of execution and the main processing components. Sometimes architectural overviews can be a bit dry. To liven it up, you may wish to see the Ejscript Language Tour first.
See also the Ejscript Web Framework architecture which describes the Ejscript Web Framework.
Ejscript Language Architecture
Ejscript consists of an optimizing compiler, an embeddable virtual machine, a system library and a set of tools. All were designed and implemented to provide a high performance and compact server-side JavaScript programming environment.
The main components of the Ejscript language environment:
- Command Tools - Programs to compile, interpret and manage Ejscript programs
- Virtual Machine - Core execution engine for Ejscript
- Compiler - Integrated or stand-alone compiler
- Modules - Cached, pre-compiled Ejscript programs
- Core Class Library - Foundation core types and classes
Command Tools
Ejscript provides a set of commands to interpret, compile and manage ejscript programs and documentation. These commands are invoked from the command line and are used manually and also automatically by the Ejscript web framework and other embedded programs.
Ejscript provides the following command tools:
- ejs - Ejscript command shell
- ejsc - Ejscript compiler
- ejsmod - Module manager and documentation generator
- mvc - Web framework generator
For full details about the command line options, please see the Manual Pages for these commands.
Virtual Machine
The Ejscript virtual machine is a portable, cross-platform virtual machine optimized for embedded use outside browsers. It has a compact byte code that minimizes memory use and supports cached compilation of scripts for repeated execution.
It is a direct-threaded, stack-based, early-binding, high-level, type-less byte code, cloning, composite native classed VM. Each of these attributes is described below.
Direct Threaded
Direct-threaded refers not to multi-threading, but to an approach in VM design where the VM inner loop will jump directly from opcode to opcode without a central dispatch loop. Direct threading can sometimes have an adverse effect on CPU instruction caching, but this can be offset by having a high level byte code that can invoke operations directly on properties without having a fetch-do-store cycle. We aim to further mitigate cache depletion by the use of inline native code generation in a future release.
Stack Based
Ejscript uses a stack-based expression stack with bypass opcodes. This is a blend of the stack-based and register-based approaches to VM design. By using high-level byte codes that can access, manipulate and call properties directly, and bypass the expression stack, Ejscript avoids the overhead of a stack-based VM and gains many of the benefits of a register-based approach.
Early Binding
The VM makes extensive use of early binding to resolve property references into direct references and has many dedicated byte codes for fast property access. The Ejscript compiler tries wherever possible to bind property references to property slot offsets and uses these optimized opcodes for such accesses. This binding is used not just for load/store operations but also for function calls and object creation.
Object Shaping
Ejscript optimizes the creation of classes and objects. It detects when objects are being created using prototype based inheritance and then transparently creates backing classes. These classes allow very fast property access and quick repeat object allocations.
High Level, Type-Less Byte Code
The Ejscript byte code learns from the best of thinking in recent VM byte code design, and has a high-level, type-less byte code with out-of-band exception handling.
What is somewhat unusual about the Ejscript VM design, is the lack of dedicated type byte codes. There are no string or number byte codes -- all types are treated equally. This has freed up large swathes of byte codes to be allocated to optimized property access byte codes.
The compiler uses the optimized early binding byte codes to streamline data access and the VM ensures valid binding at run-time. The net result is a very compact byte code and tiny program footprints. For embedded web applications this is ideal and translates to fast loading and response.
Cloning
Ejscript supports multiple interpreters with define data interchange mechanisms. This ensures security where the data from one interpreter is not visible from another. Ejscript also supports rapid cloning of interpreters from a master interpreter. This is used in the Ejscript web framework to quickly create an interpreter to service a web request. Memory footprint and garbage collection overhead is minimized by read-only sharing of system types and classes between interpreters.
Composite Native Classes
Ejscript supports composite native types and classes that can be created in native C code. A composite native class is one where the properties of the class are not represented as discrete Ejscript properties that exist in the VM in their own right, but rather as native system types. This can dramatically compress native classes down to the minimum storage required to represent the objects state. Composite native classes have several benefits:
- Native classes can directly access properties without going via the VM.
- The memory footprint is much smaller, typically more than a 10x reduction in size.
For example: A video class may have height and width coordinate properties. In a normal scripted or native class, these would be discrete Ejscript numeric objects with their own storage. In a composite native class, the class will store these internally as packed, native integers.
Ejscript can support composite native types because the VM and its native type interface delegates operations to the native types themselves. This is in keeping with the high-level byte code philosophy.
Garbage Collector
The Ejscript garbage collector is a generational, non-compacting, mark and sweep collector. It is optimized for use in embedded applications - particularly when it is embedded inside web servers.
The garbage collector is generational in that it understands the ages of objects and focuses on collecting more recent garbage first and most aggressively. This results in short, quick collection cycles. This works especially well for web applications where the per-request interpreters are cloned from a master interpreter. In this case, most of the long lived generation of system classes and types reside in a master interpreter and are best not considered for collection.
The collector is also integrated into the event mechanism so that collections will run during idle periods for longer running or event based applications.
The garbage collector is non-compacting because for short duration applications the overhead of compacting memory outweighs the benefit. Furthermore, by using direct object references, early-binding and direct-threading, the VM can have very optimized code paths when accessing object properties.
For web applications, more generalized garbage collectors are sometimes less suited as they are often optimized for long running applications, whereas web applications are typified by short running requests. For such short applications, the collector should not waste time doing unnecessary collections and may, in fact, not need to be run at all. The Ejscript memory allocator frees entire arenas of memory at the completion of web requests and thus avoids garbage collection in many cases.
When objects are collected, they are recycled via type-specific object pools that provide very rapid allocation of new object instances.
Future VM Enhancements
This VM is a good platform for future enhancements. The byte code was designed to support inline native code interleaving where the compiler can selectively generate native code blocks. The compiler will be able to generate this native code either when pre-compiling for maximum speed or on-the-fly as a just-in-time compiler.
Compiler
The Ejscript compiler is an optimizing script compiler that generates optimized byte code. It has up to 4 passes depending on the mode of compilation:
- Parse
- Conditional compilation and hoisting
- Early binding
- Code generation
The compiler supports conditional compilation based upon the evaluation of constant time expressions.
The Ejscript compiler can run either stand-alone or be embedded in a shell or other program.
Embedded Compilation
When the Ejscript compiler is embedded into another application, it provides a one-step compile and execute cycle. The byte code output of the compiler can optionally be cached as module files or can be simply created, executed, and then discarded.
The ejs command shell uses this approach to provide an interactive Ejscript command shell. The Appweb and Apache web servers also use this approach to host the Ejscript web framework for web applications.
Stand Alone Compilation
The Ejscript compiler can also operate in a stand-alone mode where scripts are compiled into optimized byte code in the form of module files. These compact module files can be distributed and executed by the VM without requiring further use of the compiler. This is often ideal for embedded applications where saving the memory footprint normally occupied by the compiler is a compelling advantage.
Compiling scripts into byte code and module files at development time has other advantages as well. The compiler can spend more time optimizing the program and thus generate better byte code. If the program is expected to run many, many times - spending some development cycles to produce the best code possible is well worth the effort. The code may also be more secure as the source code for the program is not being supplied — only the byte code.
Modules
Ejscript provides a module language directive and module loading mechanism that permits like sections of a program to be grouped into a single loadable module.
The Ejscript module directive solves several problems for the script developer
- How to package script files into a convenient package.
- How to provides a unique, safe namespace for declarations that is guaranteed not to clash with other modules.
- How to specify and resolve module dependencies.
- How to version and permit side-by-side execution of different versions of the same module.
Module Packaging
The Ejscript module format provides a compressed archive of module declarations in a single physical file. The module file contains byte code, constant pool, debug code, exception handlers and class and method signatures. A single module file may actually contain multiple logical modules. This way, an application containing more than one module may still be packaged as a single module file.
The compiler has the ability to compile multiple source files and generate one file per module or optionally merge input files (or modules) into a single output module. The compiler thus functions as a link editor resolving module dependencies and ensuring the final module contains all the necessary dependency records.
When debug information is enabled (via the --debug switch) the compiler will store debug information in the module file to enable symbolic debugging. Ejscript currently does not (yet) have a symbolic debugger, but trace output and C level debugging can inspect script source statements and line numbers.
Module Namespaces
Ejscript provides a safe namespace by using namespaces to qualify module names. The namespace for a module ensures that two declarations of the same name, but in different modules can be correctly identified by the compiler and VM.
Namespaces are mostly transparent to the user and module writer - except that the user imports the module by using a "use module" directive, and the module writer publishes a module by bracketing the code in a "module" directive.
Module Dependencies
The Ejscript module file format specifies any and all dependent modules that are required by the module. A module dependency is specified in the Ejscript source program via the "use module" directive. This instructs the compiler to load (or compile) that module and perform various compile time and run time checks. The VM module loader will load all dependent modules and initialize them in the appropriate order.
Module names are opaque, but by convention must uniquely qualify a module name. You can use a reverse domain naming convention to ensure uniqueness. It is anticipated that in the future, some form of module registry will exist to reserve unique names.
Side by Side Execution
Because modules load in dedicated and isolated namespaces, a module is guaranteed not to clash with other names. Module writers can version packages by adding a version namespace in addition to the module namespace. The user of the module can then put a "use namespace" directive in their code to request a specific version of the module.
Core Class Library
The core Ejscript class library provides an extensive suite of classes that cover common programming tasks and requirements. These are divided into the following modules:
- intrinsic - Core system types
- ejs.db - Database
- ejs.web - Web Framework
See the API reference for full details regarding the script programming API.
Intrinsic
Some of the classes in the intrinsic module: Array, Boolean, ByteArray, Date, Emitter, Error, Function, Http, Number, Object, Reflect, RegExp, Socket, Stream, String, TextStream and XML
ejs.db
The ejs.db module provides the Database and Record classes which provide a low level SQLite access API and also a higher level Object Relational Mapping layer that is used by the Web Framework.
ejs.web
The web framework provides a complete web application framework. It includes the classes: Controller, Cookie, HttpServer, Request, Session, UploadFile, and View.