master/chapter/related_work.tex

\chapter{Related Work}

In this chapter,
we discuss various techniques and languages for code querying,
present approaches to tree manipulation and transformation,
and describe several JavaScript parsers.
We also discuss aspect-oriented programming and model-driven language engineering.

\section{Source code query languages}

To allow for simple analysis and refactoring of code, there exist many query languages designed to query source code.
These languages use various techniques to allow for querying code
based on specific paradigms
(such as: logical queries, declarative queries, SQL-like queries, etc.).

\subsection{CodeQL}
\emph{CodeQL}~\cite{CodeQL} is an object-oriented query language, previously known as \emph{.QL}.
CodeQL is used to semantically analyze code to discover vulnerabilities~\cite{CodeQLStuff}.
The language is inspired ~\cite{CodeQLStuff} by SQL~\cite{SQL}, Datalog~\cite{Datalog}, Eindhoven Quantifier Notation~\cite{EindhovenQuantifierNotation}, and classes are predicates~\cite{Predicates}.

An example~\cite{CodeQLStuff} of how queries are written in CodeQL is as follows.
\begin{lstlisting}
from Class c
where c.declaresMethod("equals") and
    not(c.declaresMethod("hashCode")) and
    c.fromSource()
select c.getPackage(), c
\end{lstlisting}
This query will find all class that have method \texttt{equals},
but do not have method \texttt{hashCode}.

As can be seen from this example, the SQL-like syntax of writing queries in CodeQL is substantially different from \DSL{}, which aims at a more declarative syntax. This makes the writing experience of the two languages very different:
writing CodeQL queries are similar to querying a database, while queries written in \DSL{} are similar to defining an example of the structure one wishes to search for.

\subsection{PMD XPath}

PMD XPath is a language for Java source code querying,
This language supports querying of all Java constructs~\cite{ProgrammingLanguageEcolutionViaSourceCodeQueryLanguages}.
The reason it has this wide support is due to it constructing the entire codebase's AST in XML format, and then performing the query on the corresponding XML.
These queries are performed using XPath expressions that define matching on XML trees.
This makes the query language versatile for static code analysis,
and it is used in the \emph{PMD} static code analysis tool~\cite{PMDAnalyzer}.

An example~\cite{PMDXPathRule} PMD XPath queries are as follows.
\begin{lstlisting}
//VariableId[@Name = "bill"]
//VariableId[@Name = "bill" and ../../Type[@TypeImage = "short"]]
\end{lstlisting}
This query can be applied, for example, to the following Java code~\cite{PMDXPath}:
\begin{lstlisting}
public class KeepingItSerious{
    Delegator bill; // FieldDeclaration

    public void method(){
        short bill; // LocalVariableDeclaration
    }
}
\end{lstlisting}
If we execute the queries on this code, the first query will match against the field declaration \texttt{Delegator bill} and \texttt{short bill}, while the second query will only return \texttt{short bill}.
The reason the second limits the search
is that we define the type of the declaration.

\DSL{} uses JavaScript code \emph{templates} to specify queries;
this supposedly makes writing such queries simpler for users as they write JavaScript. In its turn, PMD XPath uses XPath expressions to perform define structural queries that is quite verbose, and requires extended knowledge of the AST that is currently being queried.

\subsection{XSL Transformations}

XSLT~\cite{XSLT} is a language for performing transformations of XML documents,
either to other XML documents, or to different formats altogether (such as HTML or plain text).

XSLT is part of Extensible Stylesheets Language family of programs.
The XSL language is expressed in the form of a stylesheet~\cite[Sect.~1.1]{XSLT},
whose syntax is defined in XML.
This language uses a template based approach to define matches on specific patterns in the source to find sections to transform.
These transformations are defined by a transformation declaration
that describes how the output of the match should look.

The example XML document represents a program, where each node \texttt{variable} has an attribute \texttt{name}.
\begin{lstlisting}
<program>
    <variable name="a"/>
    <variable name="b"/>
    <variable name="c"/>
</program>
\end{lstlisting}

To transform the example above, we define a transformation in XSLT seen below. This transformation contains two match templates; the first template matches nodes \texttt{program}, this template copies the node in the transformation with \texttt{xsl:copy} and applies the second transformation to all child nodes. The second transformation matches element \texttt{person}, it defines a transformation that changes node from \texttt{variable} to \texttt{const}.

\begin{lstlisting}
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/program">
        <xsl:copy>
            <xsl:apply-templates select="variable"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="variable">
        <const name="{@name}"/>
    </xsl:template>
</xsl:stylesheet>

\end{lstlisting}

The result of running the XSLT transformation above on the XML we defined is shown below.
\begin{lstlisting}
<program>
   <const name="a"/>
   <const name="b"/>
   <const name="c"/>
</program>
\end{lstlisting}

Though XSLT defines matching in a manner similar to \DSL{},
its approach to define transformations is different: \DSL{} allows the user to specify a code fragment interspliced with wildcards,
while XSLT requires specifying a transformation (written in a functional style).
Moreover, \DSL{}'s implementation is tailored for the use by the TC39 committee, while XSLT's expressive power allows specifying arbitrary complex transformations of tree-like data structures.

\subsection{Jackpot}

\emph{Jackpot}~\cite{Jackpot}
(also known as \emph{Java Declarative Hints Language})
is a query language
that uses declarative patterns to define source code queries:
these queries are used in conjunction with multiple rewrite definitions.
The language is used in the Apache Netbeans~\cite{ApacheNetBeans}
suite of tools to allow for declarative refactoring of code.

The example of a query and transformation below queries the code for variable declarations with initial value of 1,
and then changes them into a declaration with initial value of 0.
\begin{lstlisting}
"change declarations of 1 to declarations of 0":
    int $1 = 1;
=>  int $1 = 0
\end{lstlisting}


Jackpot is quite similar to \DSL{},
as both languages define queries
by using similar structure.
In Jackpot,
one defines a \textit{pattern},
and then every match of that pattern can be re-written
to a \textit{fix-pattern}.
Each fix-pattern can have a condition attached to it.
This is quite similar to the \textit{applicable to} and \textit{transform to} sections of \DSL{}.
Jackpot also supports a feature
which is similar to the wildcards in \DSL{}---one can define variables
in the \textit{pattern} definition and transfer them over to the
\textit{fix-pattern} definition.
In constant to \DSL{}, wildcard type restrictions and notation for matching more than one AST node are not supported in Jackpot.


\section{IntelliJ structural search}
JetBrains IntelliJ-based Integrated Development Environments
have a feature that allows for structural search and replace~\cite{StructuralSearchAndReplaceJetbrains}.
This feature is intended for large code bases
where a developer wishes to perform a search and replace
based on syntax and semantics,
and not a (regular) text based search and replace.

When doing structural search in IntelliJ-based IDEs,
templates are used to describe the query used in the search.
These templates use variables described with \texttt{\$variable\$};
these allow for transferring context to the structural replace.

In the figure below we perform a structured search for a method declaration with three parameters of type \texttt{int}, and replace it with a method declaration where all parameters are of type \texttt{double} and the return type is \texttt{double}.
\begin{figure}[H]
\begin{center}
\includegraphics[width={.85\textwidth}]{figures/image.psd.png}
\end{center}
\caption{Example of Intellij structural search and replace}
\end{figure}

This tool is interactive,
and every match is showcased in the \emph{Find} tool.
In this tool,
a developer can decide which matches to apply the replace template to.
This allows for error avoidance and a stricter search
that is verified by humans.
If the developer wishes so,
they do not have to verify each match and can replace all matches at once.

IntelliJ structured search and replace and \DSL{} have similarities:
they both are template-based.
In both approaches, templates can contain variables and wildcards
to allow for matching against arbitrary code.
Both tools also support matching multiple code parts against a single variable or a wildcard.
A core difference between the two tools is the variable type system:
when performing a match and transformation in \DSL{},
the types are used extensively to limit the match against the wildcards,
while this limitation is not possible in IntelliJ.


\section{JavaScript parsers}

This section will explore other JavaScript parsers that could have been used in this project.
We will give a brief introduction of each of them,
and discuss why they were not chosen.

\subsection*{Speedy Web Compiler}
Speedy Web Compiler~\cite{SpeedyWebCompiler} (SWC) is a library created for parsing and compiling JavaScript and other dialects (such as JSX and TypeScript).
It is written in Rust and is known for its improved performance.
SWC is used by large organizations creating applications and tooling for the web platform.

Speedy Web Compiler supports various features, such as:
\emph{compilation} (used for TypeScript and other languages that are compiled down to JavaScript),
\emph{bundling} (which takes multiple JavaScript/TypeScript files and bundles them into a single output file, while handling naming collisions),
\emph{minification} (that makes the bundle size of a project smaller, transforming for use with WebAssembly),
as well as \emph{custom plugins} (to change the specification of the languages parsed by SWC).

SWC was considered to be used in this project, however due to SWC only supporting proposals when they reach stage 3, it was not possible to use this parser.

\subsection*{Acorn}

Acorn~\cite{AcornJS} is parser written in JavaScript to parse JavaScript
and related languages.
Acorn focuses on plugin support to support extending and redefinition
of how its internal parser works.
Acorn focuses on being a small and performant JavaScript parser,
and has a custom tree traversal library Acorn Walk.
Babel is originally a fork of Acorn,
and while Babel has since had a full rewrite,
Babel is still heavily based on Acorn~\cite{BabelAcornBased}.

Acorn was considered as a parser in this project,
however it does not have the same wide community as Babel,
and does not have the same recommendation from TC39 as Babel does~\cite{TC39RecommendBabel}.
Even though it supports plugins and the plugin system is powerful,
there does not exist the same amount of pre-made plugins
for early stage proposals as Babel has.

\section{Model-to-Model Transformations}
Model-to-Model transformations are an integral part of model-driven engineering (MDE), which is a methodology that focuses on the creation and modification of abstract models rather than focusing on executable code~\cite{MDE}. This methodology provides a higher-level approach to developing large software systems.

The process of performing a model-to-model transformation is to convert one model into another,
while preserving or adapting its underlying semantics and structure~\cite{ModelToModelTransformations}.
This is usually done by traversing its structure,
and extracting data and transforming its format
to fit the model it should be transformed into.
This allows a model described within one domain
to be transformed into another automatically.


\section{Aspect-Oriented Programming}

Aspect-Oriented Programming~\cite{AOP} (AOP)
s a programming paradigm that enables modularity by allowing
for a high degree of separation of concerns,
specifically focusing on cross-cutting concerns.
Cross-cutting concerns are aspects of a software program or a system
that have an effect at multiple levels,
cutting across the main functional requirements.
Such aspects are often related to security,
logging, or error handling,
but could be any concern that are shared across an application.

In AOP,
one creates an \textit{aspect},
which is a module that contains some
cross-cutting concern the developer wants to achieve.
An aspect contains \emph{advices},
which are the specific code fragments executed when certain conditions of the program are met
(for example, a \textit{before advice} is executed before a method executes, an \textit{after advice} is executed after a method regardless of the methods outcome, an \textit{around advice} surrounds a method execution).
Contained within the aspect is also a \textit{pointcut},
which is the set of criteria determining
when the aspect is meant to be executed
(these can be at specific methods
or when specific constructors are called, and so on).

One can see a similarity between \DSL{} and aspect-oriented programming:
to define where \textit{pointcuts} are placed, we have to define some structure and the AOP weaver has to search the code execution for events triggering the pointcut and run the advice defined within the aspect of that given pointcut.