master/chapter/ch2.tex

\chapter{Background}

Below we give an overview of
the evolution process of the ECMAScript programming language,
abstract syntax trees,
source code querying,
domain-specific languages,
and language workbenches.
These are instrumental to the implementation
of the tool described in this thesis.

\section{Evolution of the JavaScript programming language}
\emph{Technical Committee 39} (TC39) is a technical committee within
Ecma International,
whose main goal is to develop the language standard
for the ECMAScript programming language
(informally known as JavaScript);
this standard is known as ECMA-262~\cite{ecma262}.
Apart from this standard,
the committee is also responsible for maintaining
related standards:
on internalization API (ECMA-402),
the standard for JSON (ECMA-404),
and
ECMAScript specification suite (ECMA-414).
The members of the committee are representatives of companies,
academic institutions,
and other organizations
interested in developing and maintaining the ECMAScript language.
The delegates include
experts in JavaScript engines,
tooling surrounding JavaScript,
and other areas of the JavaScript ecosystem.

\paragraph{ECMA-262 Proposals}
We explain now what a proposal is,
and how proposals are developed in TC39 for the ECMA-262 language standard.

A \emph{proposal} is a suggested change to the ECMA-262 language standard.
These additions to the standard have to solve some form of problem with the current version of ECMAScript.
Such problems can come in many forms,
and can apply to any part of the language.
Examples include: a feature that is not present in the language,
inconsistent parts of the language,
simplification of common patterns, and so on.
The proposal development process is defined in the \emph{TC39 Process Document}~\cite{TC39Process},
which describes each stage a proposal has to go through
in order to be accepted into the ECMA-262 language standard.

The purpose of \emph{stage 0} of the process is to allow
for exploration and ideation around which parts of the current version
of ECMAScript can be improved,
and then to define a problem space for the committee to focus on improving.

At \emph{stage 1}, the committee will start development
of a proposal.
In order for a proposal to enter this stage, several requirements have to be fulfilled.
First, a champion---a delegate of the committee who will be responsible for the advancement of the proposal---has to be identified.
In addition, a rough outline of the problem must be provided,
and a general shape of a solution must be given.
There must have been a discussion around key algorithms,
abstractions and semantics of the proposal.
Exploration of potential implementation challenges and cross-cutting concerns must have been done.
The final requirement is for all parts of the proposal
to be captured in a public repository.
Once all these requirements are met,
a proposal is accepted into stage 1.
During this stage, the committee will work on the design of a solution,
and resolve any cross-cutting concerns discovered previously.

At \emph{stage 2}, a preferred solution has been identified.
Requirements for a proposal to enter this stage are as follows:
all high level APIs and syntax must be described in the proposal document,
illustrative examples have to be worked out,
and an initial specification text must be drafted.
During this stage,
the following areas of the proposal are explored:
refining the identified solution,
deciding on minor details,
and create experimental implementations.

At \emph{stage 2.7},
the proposal is principally approved,
and has to be tested and validated.
To enter this stage,
the major sections of the proposal must be complete.
The specification text should be finished,
and all reviewers of the specification have approved.
Once a proposal has entered this stage,
testing and validation will be performed.
This is done through the prototype implementations at stage 2.

Once a proposal has been sufficiently tested and verified,
it is moved to \emph{stage 3}.
During this stage,
the proposal should be implemented in at least two major JavaScript engines.
The proposal should be tested for web compatibility issues,
and integration issues in the major JavaScript engines.

At \emph{stage 4} the proposal is completed and will be included in the next revision of the ECMA-262.

\section{Abstract Syntax Trees}
\label{sec:backgroundAST}
An \emph{abstract syntax tree} (AST)
is a tree representation of source code.
Every node of such a tree represents a construct from the source code.
ASTs remove syntactic details while maintaining the
\emph{structure} of the program.
Each node is set to represent constructs of the programming language,
such as statements, expressions, declarations, and so on.
Thus, every node type represents a grammatical construct in the language the AST was built from.

ASTs are important for manipulating source code;
they are used by various tools
that need to represent source code in some way to perform operations with it~\cite{AST3}.
Using ASTs is favored over raw text due to their structured nature;
this especially manifests when considering tools like compilers,
interpreters, or code transformation tools.
ASTs are produced by language \emph{parsers}.
For JavaScript, one of the popular libraries used for parsing is \emph{Babel}~\cite{Babel}.
Babel is a JavaScript toolchain,
and its main usage is converting source code written in the version ECMASCript 2015 or a newer one into older versions of JavaScript.
This conversion is done to increase the compatibility of JavaScript
in older execution environments.
Babel has a suite of libraries used to work with JavaScript source code.
Each library relies on Babel's AST definition~\cite{BabelAST}.
The AST specification Babel uses tries to stay
as close as possible to the ECMAScript standard~\cite{BabelSpecCompliant}.
This fact has made Babel a recommended parser to use
for proposal transpiler implementations~\cite{TC39RecommendBabel}.
A simple example of how source code parsed into an AST with Babel
can be seen in Figure~\ref{ex:srcToAST}.

\begin{figure}[H]
\noindent\begin{minipage}{.30\textwidth}
\begin{lstlisting}[language={JavaScript}]
let name = f(100);
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.65\textwidth}
\begin{center}
\begin{tikzpicture}[
    squarednode/.style={rectangle, draw=red!60, fill=black!5, very thick, minimum size=2mm}, node distance=10mm and 5mm
]
\node[squarednode] (VarDecl)        {VariableDeclaration};
\node[squarednode] (VarDeclarator)  [below=of VarDecl] {VariableDeclarator};
\node[squarednode] (id)             [below left= 10mm and -10mm of VarDeclarator] {Identifier: name};
\node[squarednode] (callExpr)       [below right= 10mm and -10mm of VarDeclarator] {CallExpression};
\node[squarednode] (cid)            [below left= 10mm and -10mm of callExpr] {Identifier: f};
\node[squarednode] (arg)            [below right= 10mm and -15mm of callExpr] {NumericLiteral: 100};

\draw[] (VarDecl.south) -- (VarDeclarator.north);
\draw[] (VarDeclarator.south) -- (id.north);
\draw[] (VarDeclarator.south) -- (callExpr.north);
\draw[] (callExpr.south) -- (cid.north);
\draw[] (callExpr.south) -- (arg.north);
\end{tikzpicture}
\end{center}
\end{minipage}\hfil
\caption{\label{ex:srcToAST} Example of source code parsed to Babel AST.}
\end{figure}

To achieve compilation of newer versions into older versions,
Babel uses a \emph{plugin} system that allows
a myriad of features to be enabled or disabled.
This makes the parser versatile to fit different ways
of working with JavaScript source code.
Because of this,
Babel allows parsing of JavaScript experimental features.
These features are usually proposals that are under development by TC39,
and the development of these plugins are a part of the proposal deliberation process.
This allows for experimentation as early as \emph{stage 1} of the proposal development process.
Some examples of proposals that were first supported by Babel's plugin system
are ``Do Expression''~\cite{Proposal:DoProposal}
and ``Pipeline''~\cite{Pipeline}.
These proposals are both currently at \emph{stage 1} and \emph{stage 2}, respectively.

In this project,
we will use Babel to parse JavaScript into abstract syntax trees.
This choice was made because of Babel's support of very early stage proposals.


\section{Source Code Querying}
Source code querying is the action of searching source code
to extract some information or find specific sections of code.
Source code querying comes in many forms,
the simplest of which is text search.
Since source code is primarily text,
one can apply text search techniques to perform a query,
or a more complex approach using regular expressions
(e.g., tools like \texttt{grep}).
Both these methods do not allow for queries based on the structure of the code,
and rely solely on its syntax.
AST-based queries allow queries to be written
based on both syntax and structure,
and are generally more powerful than regular text based queries.
Another technique for code querying is based on semantics of code.

The primary use cases for source code querying are
code understanding, analysis, code navigation, enforcement of styles,
along with others.
All these are important tools developers use when writing programs,
and they all rely on some form of source code queries.
One such tool is Integrated Development Environments (IDEs),
as these tools are created to write source code,
and, therefore,
rely on querying the source code for many of their features.
One such example of code querying being used in an IDE is JetBrains IntelliJ \emph{structural search and replace}~\cite{StructuralSearchAndReplaceJetbrains}, where queries are defined based on code structure
to find and replace sections of our program.

\section{Domain-Specific languages}
Domain-specific languages (DSLs) are software languages
specialized to a specific narrow domain~\cite{Kleppe}.
DSLs allow domain experts to get involved in the software development process,
as it is expected that a domain expert would have the capabilities to read and write DSL code.
A domain-specific language allows for very concise and expressive code
to be written that is specifically designed for the domain.
Using a DSL might result in faster development because of this expressiveness within the domain;
this specificity to a domain might also increase correctness.
However, there are also some disadvantages to DSLs:
the restrictiveness of a DSL might become a hindrance
if it is not well designed to represent the domain.
Domain-specific languages also might have a learning curve,
this makes these language less accessible for the target users.
Developing a domain-specific language might
is a non-trivial process~\cite{MarkusDSL},
as implementing a DSL requires both knowledge of the domain and knowledge of software language engineering.


\section{Language Workbenches}
A \emph{language workbench}~\cite{LanguageWorkbenchMartinFowler}
is an integrated development environment
created to facilitate the development of a software language,
such as a domain-specific language.
The goal of a language workbench is to give increased productivity during development,
and to enhance the design and evolution of software languages~\cite{LanguageWorkbenchMartinFowler}.

Commonly language workbenches generate tooling for a software language.
One such tool is a language parser that is generated from the language definition within the language workbench.
Another such tool commonly generated by a language workbench is an integrated development environment,
such IDEs provide functionality such as syntax highlighting,
code navigation, error highlighting, along with others.


%When working with a language workbench,
%one manipulates an abstract representation of the language one is
%creating~\cite{LanguageWorkbenchMartinFowler}. This abstract representation is
%projected into an editable form, this editable form is how we define a software
%language within a language workbench.