251 lines
12 KiB
TeX
251 lines
12 KiB
TeX
\chapter{Background}
|
|
|
|
Below we give an overview of
|
|
the evolution process of the ECMAScript programming language,
|
|
abstract syntax trees,
|
|
source code querying,
|
|
domain-specific languages,
|
|
and language workbenches.
|
|
These are instrumental to the implementation
|
|
of the tool described in this thesis.
|
|
|
|
\section{Evolution of the JavaScript programming language}
|
|
\emph{Technical Committee 39} (TC39) is a technical committee within
|
|
Ecma International,
|
|
whose main goal is to develop the language standard
|
|
for the ECMAScript programming language
|
|
(informally known as JavaScript);
|
|
this standard is known as ECMA-262~\cite{ecma262}.
|
|
Apart from this standard,
|
|
the committee is also responsible for maintaining
|
|
related standards:
|
|
on internalization API (ECMA-402),
|
|
the standard for JSON (ECMA-404),
|
|
and
|
|
ECMAScript specification suite (ECMA-414).
|
|
The members of the committee are representatives of companies,
|
|
academic institutions,
|
|
and other organizations
|
|
interested in developing and maintaining the ECMAScript language.
|
|
The delegates include
|
|
experts in JavaScript engines,
|
|
tooling surrounding JavaScript,
|
|
and other areas of the JavaScript ecosystem.
|
|
|
|
\paragraph{ECMA-262 Proposals}
|
|
We explain now what a proposal is,
|
|
and how proposals are developed in TC39 for the ECMA-262 language standard.
|
|
|
|
A \emph{proposal} is a suggested change to the ECMA-262 language standard.
|
|
These additions to the standard have to solve some form of problem with the current version of ECMAScript.
|
|
Such problems can come in many forms,
|
|
and can apply to any part of the language.
|
|
Examples include: a feature that is not present in the language,
|
|
inconsistent parts of the language,
|
|
simplification of common patterns, and so on.
|
|
The proposal development process is defined in the \emph{TC39 Process Document}~\cite{TC39Process},
|
|
which describes each stage a proposal has to go through
|
|
in order to be accepted into the ECMA-262 language standard.
|
|
|
|
The purpose of \emph{stage 0} of the process is to allow
|
|
for exploration and ideation around which parts of the current version
|
|
of ECMAScript can be improved,
|
|
and then to define a problem space for the committee to focus on improving.
|
|
|
|
At \emph{stage 1}, the committee will start development
|
|
of a proposal.
|
|
In order for a proposal to enter this stage, several requirements have to be fulfilled.
|
|
First, a champion---a delegate of the committee who will be responsible for the advancement of the proposal---has to be identified.
|
|
In addition, a rough outline of the problem must be provided,
|
|
and a general shape of a solution must be given.
|
|
There must have been a discussion around key algorithms,
|
|
abstractions and semantics of the proposal.
|
|
Exploration of potential implementation challenges and cross-cutting concerns must have been done.
|
|
The final requirement is for all parts of the proposal
|
|
to be captured in a public repository.
|
|
Once all these requirements are met,
|
|
a proposal is accepted into stage 1.
|
|
During this stage, the committee will work on the design of a solution,
|
|
and resolve any cross-cutting concerns discovered previously.
|
|
|
|
At \emph{stage 2}, a preferred solution has been identified.
|
|
Requirements for a proposal to enter this stage are as follows:
|
|
all high level APIs and syntax must be described in the proposal document,
|
|
illustrative examples have to be worked out,
|
|
and an initial specification text must be drafted.
|
|
During this stage,
|
|
the following areas of the proposal are explored:
|
|
refining the identified solution,
|
|
deciding on minor details,
|
|
and create experimental implementations.
|
|
|
|
At \emph{stage 2.7},
|
|
the proposal is principally approved,
|
|
and has to be tested and validated.
|
|
To enter this stage,
|
|
the major sections of the proposal must be complete.
|
|
The specification text should be finished,
|
|
and all reviewers of the specification have approved.
|
|
Once a proposal has entered this stage,
|
|
testing and validation will be performed.
|
|
This is done through the prototype implementations at stage 2.
|
|
|
|
Once a proposal has been sufficiently tested and verified,
|
|
it is moved to \emph{stage 3}.
|
|
During this stage,
|
|
the proposal should be implemented in at least two major JavaScript engines.
|
|
The proposal should be tested for web compatibility issues,
|
|
and integration issues in the major JavaScript engines.
|
|
|
|
At \emph{stage 4} the proposal is completed and will be included in the next revision of the ECMA-262.
|
|
|
|
\section{Abstract Syntax Trees}
|
|
\label{sec:backgroundAST}
|
|
An \emph{abstract syntax tree} (AST)
|
|
is a tree representation of source code.
|
|
Every node of such a tree represents a construct from the source code.
|
|
ASTs remove syntactic details while maintaining the
|
|
\emph{structure} of the program.
|
|
Each node is set to represent constructs of the programming language,
|
|
such as statements, expressions, declarations, and so on.
|
|
Thus, every node type represents a grammatical construct in the language the AST was built from.
|
|
|
|
ASTs are important for manipulating source code;
|
|
they are used by various tools
|
|
that need to represent source code in some way to perform operations with it~\cite{AST3}.
|
|
Using ASTs is favored over raw text due to their structured nature;
|
|
this especially manifests when considering tools like compilers,
|
|
interpreters, or code transformation tools.
|
|
ASTs are produced by language \emph{parsers}.
|
|
For JavaScript, one of the popular libraries used for parsing is \emph{Babel}~\cite{Babel}.
|
|
Babel is a JavaScript toolchain,
|
|
and its main usage is converting source code written in the version ECMASCript 2015 or a newer one into older versions of JavaScript.
|
|
This conversion is done to increase the compatibility of JavaScript
|
|
in older execution environments.
|
|
Babel has a suite of libraries used to work with JavaScript source code.
|
|
Each library relies on Babel's AST definition~\cite{BabelAST}.
|
|
The AST specification Babel uses tries to stay
|
|
as close as possible to the ECMAScript standard~\cite{BabelSpecCompliant}.
|
|
This fact has made Babel a recommended parser to use
|
|
for proposal transpiler implementations~\cite{TC39RecommendBabel}.
|
|
A simple example of how source code parsed into an AST with Babel
|
|
can be seen in Figure~\ref{ex:srcToAST}.
|
|
|
|
\begin{figure}[H]
|
|
\noindent\begin{minipage}{.30\textwidth}
|
|
\begin{lstlisting}[language={JavaScript}]
|
|
let name = f(100);
|
|
\end{lstlisting}
|
|
\end{minipage}\hfil
|
|
\noindent\begin{minipage}{.65\textwidth}
|
|
\begin{center}
|
|
\begin{tikzpicture}[
|
|
squarednode/.style={rectangle, draw=red!60, fill=black!5, very thick, minimum size=2mm}, node distance=10mm and 5mm
|
|
]
|
|
\node[squarednode] (VarDecl) {VariableDeclaration};
|
|
\node[squarednode] (VarDeclarator) [below=of VarDecl] {VariableDeclarator};
|
|
\node[squarednode] (id) [below left= 10mm and -10mm of VarDeclarator] {Identifier: name};
|
|
\node[squarednode] (callExpr) [below right= 10mm and -10mm of VarDeclarator] {CallExpression};
|
|
\node[squarednode] (cid) [below left= 10mm and -10mm of callExpr] {Identifier: f};
|
|
\node[squarednode] (arg) [below right= 10mm and -15mm of callExpr] {NumericLiteral: 100};
|
|
|
|
\draw[] (VarDecl.south) -- (VarDeclarator.north);
|
|
\draw[] (VarDeclarator.south) -- (id.north);
|
|
\draw[] (VarDeclarator.south) -- (callExpr.north);
|
|
\draw[] (callExpr.south) -- (cid.north);
|
|
\draw[] (callExpr.south) -- (arg.north);
|
|
\end{tikzpicture}
|
|
\end{center}
|
|
\end{minipage}\hfil
|
|
\caption{\label{ex:srcToAST} Example of source code parsed to Babel AST.}
|
|
\end{figure}
|
|
|
|
To achieve compilation of newer versions into older versions,
|
|
Babel uses a \emph{plugin} system that allows
|
|
a myriad of features to be enabled or disabled.
|
|
This makes the parser versatile to fit different ways
|
|
of working with JavaScript source code.
|
|
Because of this,
|
|
Babel allows parsing of JavaScript experimental features.
|
|
These features are usually proposals that are under development by TC39,
|
|
and the development of these plugins are a part of the proposal deliberation process.
|
|
This allows for experimentation as early as \emph{stage 1} of the proposal development process.
|
|
Some examples of proposals that were first supported by Babel's plugin system
|
|
are ``Do Expression''~\cite{Proposal:DoProposal}
|
|
and ``Pipeline''~\cite{Pipeline}.
|
|
These proposals are both currently at \emph{stage 1} and \emph{stage 2}, respectively.
|
|
|
|
In this project,
|
|
we will use Babel to parse JavaScript into abstract syntax trees.
|
|
This choice was made because of Babel's support of very early stage proposals.
|
|
|
|
|
|
\section{Source Code Querying}
|
|
Source code querying is the action of searching source code
|
|
to extract some information or find specific sections of code.
|
|
Source code querying comes in many forms,
|
|
the simplest of which is text search.
|
|
Since source code is primarily text,
|
|
one can apply text search techniques to perform a query,
|
|
or a more complex approach using regular expressions
|
|
(e.g., tools like \texttt{grep}).
|
|
Both these methods do not allow for queries based on the structure of the code,
|
|
and rely solely on its syntax.
|
|
AST-based queries allow queries to be written
|
|
based on both syntax and structure,
|
|
and are generally more powerful than regular text based queries.
|
|
Another technique for code querying is based on semantics of code.
|
|
|
|
The primary use cases for source code querying are
|
|
code understanding, analysis, code navigation, enforcement of styles,
|
|
along with others.
|
|
All these are important tools developers use when writing programs,
|
|
and they all rely on some form of source code queries.
|
|
One such tool is Integrated Development Environments (IDEs),
|
|
as these tools are created to write source code,
|
|
and, therefore,
|
|
rely on querying the source code for many of their features.
|
|
One such example of code querying being used in an IDE is JetBrains IntelliJ \emph{structural search and replace}~\cite{StructuralSearchAndReplaceJetbrains}, where queries are defined based on code structure
|
|
to find and replace sections of our program.
|
|
|
|
\section{Domain-Specific languages}
|
|
Domain-specific languages (DSLs) are software languages
|
|
specialized to a specific narrow domain~\cite{Kleppe}.
|
|
DSLs allow domain experts to get involved in the software development process,
|
|
as it is expected that a domain expert would have the capabilities to read and write DSL code.
|
|
A domain-specific language allows for very concise and expressive code
|
|
to be written that is specifically designed for the domain.
|
|
Using a DSL might result in faster development because of this expressiveness within the domain;
|
|
this specificity to a domain might also increase correctness.
|
|
However, there are also some disadvantages to DSLs:
|
|
the restrictiveness of a DSL might become a hindrance
|
|
if it is not well designed to represent the domain.
|
|
Domain-specific languages also might have a learning curve,
|
|
this makes these language less accessible for the target users.
|
|
Developing a domain-specific language might
|
|
is a non-trivial process~\cite{MarkusDSL},
|
|
as implementing a DSL requires both knowledge of the domain and knowledge of software language engineering.
|
|
|
|
|
|
\section{Language Workbenches}
|
|
A \emph{language workbench}~\cite{LanguageWorkbenchMartinFowler}
|
|
is an integrated development environment
|
|
created to facilitate the development of a software language,
|
|
such as a domain-specific language.
|
|
The goal of a language workbench is to give increased productivity during development,
|
|
and to enhance the design and evolution of software languages~\cite{LanguageWorkbenchMartinFowler}.
|
|
|
|
Commonly language workbenches generate tooling for a software language.
|
|
One such tool is a language parser that is generated from the language definition within the language workbench.
|
|
Another such tool commonly generated by a language workbench is an integrated development environment,
|
|
such IDEs provide functionality such as syntax highlighting,
|
|
code navigation, error highlighting, along with others.
|
|
|
|
|
|
|
|
%When working with a language workbench,
|
|
%one manipulates an abstract representation of the language one is
|
|
%creating~\cite{LanguageWorkbenchMartinFowler}. This abstract representation is
|
|
%projected into an editable form, this editable form is how we define a software
|
|
%language within a language workbench.
|
|
|