master/chapter/ch4.tex

\chapter{Implementation}

In this chapter,
the implementation of the tool utilizing the \DSL{} and \DSL{}-SH will be presented.\footnote{The source code for this implementation can be found \url{https://github.com/polsevev/JSTQL-JS-Transform}}
It will describe the overall architecture of the tool,
the flow of data throughout,
and how the different stages of transforming user code are completed.

\section{Architecture of the solution}

As was presented in Section~\ref{cha:3},
there are two ways to specify a proposal:
either using a custom domain-specific language JSTQL,
or by using the corresponding JavaScript API.
Figure~\ref{fig:architecture} demonstrates the architecture of the implementation of these two approaches.
In the figure, ellipse nodes represent data passed into the tool,
and rectangular nodes represent specific components of the tool.

In the JSTQL approach (the ``left-side'' path in the figure),
the initial step is to parse a proposal specification and then to extract the wildcard \emph{declarations} and \emph{references} from code templates.
A corresponding step in the API-based approach (the ``right-side'' path)
is to build the prelude, where the wildcard definitions are ``extracted'' from JavaScript code.

For both of the approaches, the second step (Section~\ref{sec:WildcardExtraction}) is to parse wildcard type expressions used in the templates' specifications.
After that,
at step 3 (Section~\ref{sec:BabelParse}), Babel is used to parse and build abstract syntax trees for the \texttt{applicable to} templates and the \texttt{transform to} templates in a proposal specification, and the user's code to which the proposal will be applied.
At step 4 (Section~\ref{sec:customTree}),
we process the abstract syntax trees produced by Babel and produce a custom tree data structure for simpler traversal.
At step 5 (Section~\ref{sec:Matching}),
we match the user's AST against the templates in the \texttt{applicable to} blocks.
Once all matches have been found, we incorporate the wildcard matches into the
\texttt{transform to} template at step 6 (Section~\ref{sec:transform}), and insert it back into the users code.
At this point, the AST of the user's code has been transformed,
and the final step 7 (Section~\ref{sec:generate})
then pretty-prints the transformed AST into a JavaScript source code.

\iffalse
\begin{description}
    \item[\DSL{} Code] is the raw text definition of proposals
    \item[Self-Hosted Object] is the self-hosted version in \DSL{}SH format
    \item[1a. Langium Parser] takes raw \DSL{} source code, and parses it into a DSL
    \item[2. Wildcard parsing] extracts the wildcards from the raw template definition in \DSL{}, and parse
    \item[1b. Prelude-builder] translates JavaScript prelude into array of wildcard strings
    \item[3. Babel] parses the templates and the users source code into an AST
    \item[4. Custom Tree Builder] translates the Babel AST structure into our tree structure
    \item[5. Matcher] finds matches with \texttt{applicable to} template in user code
    \item[6. Transformer] performs transformation defined in \texttt{transform to} template to each match of the users AST
    \item[7. Generator] generates source code from the transformed user AST
\end{description}
\fi

\begin{figure}[H]
\begin{center}
\begin{tikzpicture}[
    roundnode/.style={ellipse, draw=red!60, fill=red!5, very thick, minimum size=7mm},
    squarednode/.style={rectangle, draw=red!60, fill=red!5, very thick, minimum size=5mm}
]
\node[squarednode] (preParser)                                              {2. Type Expression Parser};
\node[squarednode] (preludebuilder)         [above right=of preParser]      {1. Prelude Builder};
\node[roundnode]   (selfhostedjsoninput)    [above=of preludebuilder]       {Self-Hosted Object};
\node[squarednode] (extraction)             [above left=of preParser]       {1.2. Extract wildcards};
\node[squarednode] (langium)                [above =of extraction]       {1.1. Parse JSTQL code};

\node[roundnode]   (jstqlcode)              [above=of langium]              {JSTQL Code};
\node[squarednode] (babel)                  [below=of preParser]            {3. Babel parsing};
\node[roundnode]   (usercode)               [left=of babel]                 {User source code};
\node[squarednode] (treebuilder)            [below=of babel]                {4. Custom Tree builder};
\node[squarednode] (matcher)                [below=of treebuilder]          {5. Matcher};
\node[squarednode] (transformer)            [below=of matcher]              {6. Transformer};
\node[squarednode] (joiner)                 [below=of transformer]          {7. Generator};


\draw[->] (jstqlcode.south) -- (langium.north);
\draw[->] (langium.south) -- (extraction.north);
\draw[->] (extraction.south) |- (preParser.west);
\draw[->] (preParser.south) |- (babel.north);
\draw[->] (babel.south) -- (treebuilder.north);
\draw[->] (treebuilder.south) -- (matcher.north);
\draw[->] (matcher.south) -- (transformer.north);
\draw[->] (transformer.south) -- (joiner.north);
\draw[->] (selfhostedjsoninput.south) -- (preludebuilder.north);
\draw[->] (preludebuilder.south) |- (preParser.east);
\draw[->] (usercode.east) -- (babel.west);
\end{tikzpicture}
\end{center}

\caption[Tool architecture]{Overview of tool architecture}
\label{fig:architecture}
\end{figure}


\section{Parsing \DSL{} using Langium}

In this section,
we describe the implementation of the parser for \DSL{}.
We start with outlining the language workbench which we used to generate a parser for \DSL{}.


\emph{Langium}~\cite{Langium} is a language workbench~\cite{LanguageWorkbench}
that can be used to generate parsers for software languages, in addition to producing a tailored Integrated Development Environment for the language.

A parser generated by Langium produces abstract syntax trees which are TypeScript objects.
These objects and their structure are used as definitions for the tool
to do matching and transformation of user code.

To generate a parser,
Langium requires a definition of a grammar.
A grammar is a specification that describes syntax a valid programs in a language.
The grammar for \DSL{} describes the structure of \DSL{} specifications.
The starting symbol of the grammar represents valid specifications:
\begin{lstlisting}
grammar Jstql

entry Model:
    (proposals+=Proposal)*;
\end{lstlisting}

In its turn, a proposal's specification includes its name and a specification of at least one \emph{transformation case}.
\begin{lstlisting}
Proposal:
    'proposal' name=ID "{"
        (case+=Case)+
    "}";
\end{lstlisting}

A transformation case specification is comprised of a code template to match a JavaScript code to which the case is applicable,
and a code template that specifies how a match should be transformed.
\begin{lstlisting}
Case:
    "case" name=ID "{"
        aplTo=ApplicableTo
        traTo=TransformTo
    "}";
\end{lstlisting}
Case specifications are designed in this way in order to separate different transformation definitions within a single proposal.

An \texttt{applicable to} block specifies a JavaScript code template
with wildcard declarations. This code template is represented in the grammar
using the terminal symbol \texttt{STRING},
and will be thus parsed as a raw string of characters.
\begin{lstlisting}
ApplicableTo:
    "applicable" "to" "{"
        apl_to_code=STRING
    "}";
\end{lstlisting}
The decision to use the \texttt{STRING} terminal,
rather than a designated nonterminal symbol that would represent valid JavaScript programs with wildcards,
is motivated by two reasons:
(i) we separate parsing of the JSTQL specification structure (which is done by Langium) and parsing of JavaScript code (for which we use Babel\footnote{See Sections~\ref{sec:backgroundAST} and \ref{sec:BabelParse}.});
and (ii) we use a custom processor of wildcards to enable reuse of such a processor for both JSTQL and JSTQL-SH\footnote{See Section~\ref{sec:wildcardExtractionAndParsing}.}.

A \texttt{transform to} block is specified in a similar manner:
\begin{lstlisting}
TransformTo:
    "transform" "to" "{"
        transform_to_code=STRING
    "}";
\end{lstlisting}

Notwithstanding the fact that the code templates in \texttt{applicable to} and \texttt{transform to} blocks are treated as strings by Langium---and thus by the Visual Studio Extension for JSTQL generated by Langium---we perform validation of the wildcard declarations and references, as explained below.

\subsection*{Langium Validator}

A Langium validator allows for further check to be applied to DSL code,
a validator allows for the implementation of specific checks on specific parts of the code.

\DSL{} does not allow empty wildcard type expression definitions in \texttt{applicable to} blocks.
This is not defined within the grammar, and needs to be enforced with a validator. Concretely, we have implemented a specific \texttt{Validator} for the \texttt{Case} rule of the grammar. This means every time anything contained within a \texttt{Case} is updated, Langium will perform the validation step and report any errors. The validator implemented for our tool checks for the following errors: empty wildcard type expressions, undeclared wildcards in \texttt{transform to} block, and wildcards used multiple times in \texttt{transform to block}.

In the listing below is the validator, it performs checks on the \texttt{applicable to} block and \texttt{transform to} block of the \texttt{case}. If any errors are found it reports them with the function \texttt{accept}.

\begin{lstlisting}[language={JavaScript}]
export class JstqlValidator {
    validateWildcards(case_: Case, accept: ValidationAcceptor): void {
        try {
            let validationResultAplTo = validateWildcardAplTo(
                collectWildcard(case_.aplTo.apl_to_code.split(""))
            );
            if (validationResultAplTo.errors.length != 0) {
                accept("error", validationResultAplTo.errors.join("\n"), {
                    node: case_.aplTo,
                    property: "apl_to_code",
                });
            }

            let validationResultTraTo = validateWildcardTraTo(
                collectWildcard(case_.traTo.transform_to_code.split("")),
                validationResultAplTo.env
            );

            if (validationResultTraTo.length != 0) {
                accept("error", validationResultTraTo.join("\n"), {
                    node: case_.traTo,
                    property: "transform_to_code",
                });
            }
        } catch (e) {}
    }
}
\end{lstlisting}

\subsection*{Interfacing with Langium}

To use the parser generated by Langium, we have give our tool a way to interface with Langium. To do this, we create a custom function that calls the generated parser on some \DSL{} code and transforms the AST into a JavaScript object compatible with our tool.


\section{Wildcard extraction and parsing}
\label{sec:wildcardExtractionAndParsing}
To refer to internal DSL variables defined in \texttt{applicable to} and \texttt{transform to} blocks of the transformation,
we need to extract this information from the template definitions.

\subsection*{Why not use Langium for wildcard extraction?}
\label{sec:langParse}
Langium supports creating a generator to output an artifact,
which is some transformation applied to the AST built by the Langium parser.
This would facilitate extraction of the wildcards, however this would make \DSL{}-SH dependent on Langium. This is not preferred as that would mean both ways of defining a proposal are reliant on Langium.
The reason for using our own extractor is to allow for an independent way to define transformations using our tool.


\subsection*{Extracting wildcards from \DSL{}}
\label{sec:WildcardExtraction}
To parse the templates in \texttt{applicable to} blocks and \texttt{transform to} blocks, we have to make the templates valid JavaScript.  This is done by using a wildcard extractor that extracts the information from the wildcards and inserts an \texttt{Identifier} in their place.

To extract the wildcards from the template,
we look at each character in the template.
If a wildcard opening token is encountered, everything after that until the closing token
is treated as a wildcard definition or reference and will parsed using the wildcard parser.

Once the wildcard is parsed,
and we know it is valid,
we insert the identifier into the JavaScript template
where the wildcard would reside.
This introduces a problem of \emph{collisions} between
the wildcard identifiers inserted and identifiers present in the users code.
In order to avoid this, we prepend and append every identifier inserted in place of a wildcard with the sequence of characters \texttt{\_\$\$\_}.

\newpage
In the Listing~\ref{sec:WildcardExtraction} the function used to extract the wildcards declarations can be seen. This function iterates through each character of \texttt{applicable to} template (line 2). When an opening token for a wildcard is encountered (line 3), we collect each character into a separate variable until the closing token is encountered. This separate variable is passed to the wildcard parser to create the type expression AST of the wildcard (lines 14-16). We insert the collision avoidance characters into the wildcard (line 18), and insert the identifier into \texttt{cleanedJS} (line 19).
\begin{lstlisting}[language={JavaScript}, caption={Extracting wildcard from template.}, label={lst:extractWildcard}]
export function parseInternal(code: string): InternalParseResult {
     for (let i = 0; i < code.length; i++) {
        if (code[i] === "<" && code[i + 1] === "<") {
            // From now in we are inside of the DSL custom block
            flag = true;
            i += 1;
            continue;
        }

        if (flag && code[i] === ">" && code[i + 1] === ">") {
            // We encountered a closing tag
            flag = false;
            try{
                let wildcard = new WildcardParser(
                    new WildcardTokenizer(temp).tokenize()
                ).parse();
                wildcard.identifier.name =
                    "_$$_" + wildcard.identifier.name + "_$$_";
                cleanedJS += wildcard.identifier.name;


                prelude.push(wildcard);
                i += 1;
                temp = "";
                continue;
            }
            catch (e){
                // We probably encountered a bitshift operator, append temp to cleanedJS
            }

        }
        if (flag) {
            temp += code[i];
        } else {
            cleanedJS += code[i];
        }
    }
    return { prelude, cleanedJS };
}
\end{lstlisting}

\paragraph*{Parsing wildcards}

Once a wildcard has been extracted from  definitions inside \DSL{},
it has to be parsed into a simple AST to be used when matching against the wildcard. This is accomplished by using a tokenizer and a recursive descent parser~\cite{RecursiveDescent}.

Our tokenizer takes the contents of wildcard block template and splits it into tokens.
Given the straighforward grammar type expressions, there is no ambiguity is present with the tokens,
thus making is easy to identify which character corresponds to which token.
The tokenizer adds a \textit{token type} to the tokens, this is later used by the parser to determine which nonterminal to use.

A recursive descent parser mimics the grammar of the language the parser is implemented for: we define functions for handling each of the nonterminals.

\begin{lstlisting}[caption={Grammar of type expressions}, label={ex:grammarTypeExpr}]
Wildcard:
    Identifier ":" MultipleMatch

MultipleMatch:
    GroupExpr "+"
    | TypeExpr

TypeExpr:
    BinaryExpr
    | UnaryExpr
    | PrimitiveExpr

BinaryExpr:
    TypeExpr { Operator TypeExpr }*

UnaryExpr:
    UnaryOperator TypeExpr

PrimitiveExpr:
    GroupExpr | Identifier

GroupExpr:
    "(" TypeExpr ")"
\end{lstlisting}

The grammar of the type expressions used by the wildcards can be seen in \figFull[ex:grammarTypeExpr].


\paragraph*{Building prelude in \DSL{}-SH}

The self-hosted version \DSL{}-SH also requires some form of parsing to prepare the internal DSL environment.

To use JavaScript as the meta language,
we define a \texttt{prelude} on the object
used to define the transformation case.
This prelude is required to consist of several
\texttt{Variable declaration} statements,
where the variable names are used as the internal DSL variables
and right-hand side expressions are strings that contain
the type expression used to determine a match for that specific wildcard.

We use Babel to generate the AST of the \texttt{prelude} definition;
this allows us to get a JavaScript object structure.
Since the structure is strictly defined,
we can expect every statement of the program
to be a variable declaration; otherwise we throw an error for invalid prelude.
Then the string value of each of the variable declarations is
passed to the same parser used for \DSL{} wildcards.

The reason this is preferred is it allows us to avoid having to extract the wildcards and inserting an \texttt{Identifier} into the template.

\section{Using Babel to parse}
\label{sec:BabelParse}

Allowing the tool to perform transformations of code
requires the generation of an abstract syntax trees from the user's code,
as well as the \texttt{applicable to} and \texttt{transform to} blocks.
This means parsing JavaScript into an AST;
to do this we use Babel~\cite{Babel}.

The reason for choosing to use Babel is the fact that it supports very early-stage JavaScript language proposals. Babel's maintainers collaborate closely with the TC39 Committee in order to provide extensive support of experimental syntax~\cite{BabelProposalSupport} through its plugin system. This allows the parsing of JavaScript code that uses language features which are not yet part of the language standard.


\subsection*{Custom Tree Structure}
\label{sec:customTree}
The AST structure used by Babel does not suit traversing multiple trees at the same time, which is a requirement for matching.
Therefore, based on Babel's AST, we produce our own custom tree structure that allows for simple traversal of multiple trees at once.

As can be seen in  Figure~\ref{def:TreeStructure},
we use a recursive definition of a \texttt{TreeNode},
where a node's parent either exists or is \texttt{null} (the root),
and a node can have any number of child elements.
This definition allows for simple traversal both up and down the tree.
This means traversing two trees at the same time can be done when searching for matches in the user's code.


\begin{lstlisting}[language={JavaScript}, label={def:TreeStructure}, caption={Simple definition of a Tree structure in TypeScript}]
export class TreeNode<T> {
    public parent: TreeNode<T> | null;
    public element: T;
    public children: TreeNode<T>[] = [];

    constructor(parent: TreeNode<T> | null, element: T) {
        this.parent = parent;
        this.element = element;
        if (this.parent) this.parent.children.push(this);
    }
}
\end{lstlisting}

To place the AST into our tree structure,
we use \texttt{@babel/traverse}~\cite{BabelTraverse}
to visit each node of the AST in a \textit{depth first} manner.
We implement a \textit{visitor} for each of the nodes in the AST and when a specific node is encountered, the corresponding visitor of that node is used to visit it.
When transferring the AST into our simple tree structure,
we use a generic visitor that applies to every kind of AST node,
and place that node into the tree.

Visiting a node using the \texttt{enter()}
function means we traversed from a parent node to its child node. When we then initialize the \texttt{TreeNode} of the current child, we add the parent previously visited as its parent node.
Whenever leaving a node the function \texttt{exit()} is called,
this means we are moving back up the tree,
and we have to update what node was the \textit{last} visited to keep track of the correct parent.

In the example below is the algorithm that transforms the Babel AST into our custom tree structure.
We start by defining a variable \texttt{last} (line 1), this variable will keep track of the previous node we visited. When the visitor enters a new node, the function \texttt{enter} is called (line 3). This function creates a new node in our custom tree structure (lines 4-6), and sets its parent to the previous node visited (line 5). Once our new node has been created, we update \texttt{last} to point to our new node (line 8).

Every time we walk back up the tree, the function \texttt{exit} (line 10) is called. Whenever this happens, we have to update \texttt{last} such that it will always contain the parent of a node when we visit it (line 11).

\begin{lstlisting}[language={JavaScript}]
let last = 0;
traverse(ast, {
        enter(path: any) {
            let node: TreeNode<t.Node> = new TreeNode<t.Node>(
                last,
                path.node as t.Node
            );
            last = node;
        },
        exit(path: any) {
                last = last.parent;
        },
    });
    if (first != null) {
        return first;
    }

\end{lstlisting}

One important nuance of the way we place the nodes into the tree
is that we still have the same underlying data structure as Babel.
Because of this,
the nodes can still be used with Babel APIs,
and we can still access every field of each node.
Transforming it into a tree only creates an easy way to traverse up and down the tree by references.
We perform no changes of the underlying data structure.

\section{Outline of transforming user code}

Below is an outline of every major step performed,
and how data is passed through the program.

\begin{algorithm}[H]
\caption{An outline of the steps to perform the transformation.
Here:
$A$ denotes the \texttt{applicable to} template with wildcards extracted,
$B$ denotes the \texttt{transform to} template with wildcards extracted,
$W$ denotes extracted wildcards,
$C$ denotes the abstract syntax tree of the \textttt{applicable to} template,
$D$ denotes the abstract syntax tree of the \texttt{transform to} template,
$E$ denotes the abstract syntax tree of the user's code,
$F$ denotes the \texttt{applicable to} template in our custom tree structure,
$G$ denotes the \texttt{transform to} template in our custom tree structure,
$H$ denotes the user code in our custom tree structure,
$J$ denotes an array of all the found matches,
$K$ denotes an array that contains all \texttt{transform to} templates with context from user code inserted,
$L$ denotes the abstract syntax tree of the transformed user code,
and $SourceCode$ is the transformed user code pretty-printed as JavaScript.}

\label{lst:outline}
\begin{algorithmic}[1]
\State $A, B, W \gets extractWildcards()$

\State $C, D, E \gets babel.parse(A, B, UserCode)$

\State $F, G, H \gets Tree(C, D, E)$

\If{$F.length > 1$}
    \State $J \gets multiMatcher(F, E, W)$
\Else
    \State $J \gets singleMatcher(F, E, W)$
\EndIf

\State $K \gets []$ \Comment{Array of transformed code}
\For{\textbf{each} $m$ \textbf{in} $J$}
    \State $K.insert \gets buildTransform(m, G, W)$
\EndFor

\State $L \gets  insertTransformations(K)$
\State $SourceCode$$ \gets  babel.generate(L)$
\end{algorithmic}
\end{algorithm}


Each part of Algorithm \ref{lst:outline} is a step to transform user code based on a proposal specification in our tool.

In the initial of the algorithm (line 1), the wildcards are extracted from the templates \texttt{applicable to} and \texttt{transform to}, and replaced by identifiers. The extracted wildcards are then parsed into ASTs using a parser built into the tool.

We  parse the \texttt{applicable to} template, \texttt{transform to} template and the user's code into ASTs with Babel (line 2). These ASTs are immediately translated into our custom tree structure (line 3). This ensures simple traversal of multiple trees.

To decide which matching algorithm we apply, the length of the \texttt{applicable to} template is checked (line 5), if it is more than one statement long we use \texttt{multiMatcher} (line 5), if it is a single statement we use \texttt{singleMatcher} (line 7). These algorithms will find all matching parts of the user AST to the \texttt{applicable to} template.

We use these matches to prepare the \texttt{transform to} templates (lines 9-12). The AST nodes from the user code that was matched with a wildcard is inserted into the wildcard references present in the \texttt{transform to} template (line 11). All the transformed \texttt{transform to} templates are stored in a list(line 9, 11).

Once all transformations are prepared, we traverse the user AST (line 13), and insert the transformations where their corresponding match originated. The final step, is to generate JavaScript from the transformed AST (line 14).


\section{Matching}
\label{sec:Matching}
This section discusses how we find matches in the users code;
this is the step described in lines 4-8 of Listing~\ref{lst:outline}.
We will discuss how individual nodes are compared, how the two traversal algorithms are implemented, and how matches are discovered using these algorithms.


\subsection{Determining if AST nodes match}

To determine if two nodes are a match, we need some method to compare AST nodes of the \texttt{applicable to} template to AST nodes of the user code. This step also has to take into account comparisons with wildcards and pass that information back to the AST matching algorithms.

When comparing two AST nodes in this tool, we use the function \texttt{checkCodeNode}, which will give the following values based on what kind of match these two nodes produce.
\begin{description}
    \item[NoMatch] The nodes do not match.
    \item[Matched] The nodes are a match, and the node of \texttt{applicable to} is not a wildcard.
    \item[MatchedWithWildcard] The nodes are a match, and the node of \texttt{applicble to} is a wildcard.
    \item[MatchedWithPlussedWildcard] The nodes are a match, and the node of \texttt{applicable to} is a wildcard with the Kleene plus.
\end{description}

To compare two AST nodes, we start by comparing their types, if the types are not the same, the result is \texttt{NoMatch}. If the types are the same, further checks are required.

Firstly we need to determine if the current AST node of \texttt{applicable to} is a wildcard. To do this, we check if its type is either an \texttt{Identifier} or an \texttt{ExpressionStatement} with an \texttt{Identifier} as its expression. During the wildcard extraction step, we replace the wildcard with an identifier. As a result, an identifier might be placed as a statement. When this occurs, the identifier will be wrapped inside an \texttt{ExpressionStatement}. If we encounter either of these two types, we must then check if the name of the identifier matches the name of a wildcard. If it does, we evaluate the type of the user AST node against the wildcards type expression.

In the example below we determine if the node of \texttt{applicable to} might be a wildcard
\begin{lstlisting}
if((aplToNode.type === "ExpressionStatement" &&
    aplToNode.expression.type === "Identifier") ||
    aplToNode.type === "Identifier"){

    // Check if aplToNode is a wildcard
}
\end{lstlisting}

If we have determined the node of \texttt{applicable to} is not a wildcard, we then compare the two nodes to see if they match. For certain nodes, like \texttt{Identifier}, this involves explicitly checking specific fields, such as comparing the name field. For most nodes, however, we compare their types. Based on this comparison, the result will be either \texttt{Match} or \texttt{NoMatch}.

When comparing an AST node type against a wildcard type expression, we evaluate the wildcard type expression relative to the type of the node being compared. This evaluation employs the visitor pattern to traverse the AST, where each leaf node is checked against the type of the node being compared, yielding a Boolean result. All expressions are subsequently evaluated, with the values passed through the visitors until the entire expression is evaluated, producing a final result. If the evaluation result is \texttt{false}, we return \texttt{NoMatch}. If the evaluation result is \texttt{true}, we have to check if the wildcard uses a Kleene plus, if it does we return \texttt{MatchedWithPlussedWildcard}, if it does not we retrun \texttt{MatchedWithWildcard}.

\subsection{Matching a single Expression/Statement template}
\label{sec:singleMatcher}

In this section, we discuss how matching is performed when the \texttt{applicable to} template is a single expression/statement. This section will cover line 7 of Listing \ref{lst:outline}.

Determining if we are currently matching with a template that is only a single expression/statement, we must verify that the program body of the template has the length of one, if this is the case we use the single length traversal algorithm.

There is a special case when the template is a single expression. When this is the case, the first node of the AST generated by \texttt{@babel/generate}~\cite{BabelGenerate} will be of type \texttt{ExpressionStatement}. This will miss many applicable parts of the users code, because expressions within other statements are not wrapped in an \texttt{ExpressionStatement}. This makes the template incompatible with otherwise applicable expressions. This means the statement has to be removed, and the search has to be done with the expression as the top node of the template.


\paragraph{Discovering Matches Recursively}

The matching algorithm used with single expression/statement templates is based on depth-first search to traverse the trees. The algorithm can be split into two steps. The first step is to start a new search on each child of the current node explored, and the second is the check the current node for a match.


The first step is to ensure we search for a match at all levels of the code AST. This is done by starting a new search on every child node of the code AST if the current node of the \texttt{applicable to} AST is the root node. This ensures we have explored a match at every level of the tree. As an added benefit of this approach is it ensures we have no partial matches, as we store a match only if it was called with the root node of the \texttt{applicable to} AST. This behaviour can be seen in the listing below.
\begin{lstlisting}[language={JavaScript}]
if(aplTo.element === this.aplToRoot){
    // Start a search from root of aplTo on all child nodes
    for(let codeChild of code.children){
        let childMatch = singleMatcher(codeChild, aplTo);

        // If it is a match, we know it is a full match and store it.
        if(childMatch){
            this.matches.push(childMatch);
        }
    }
}
\end{lstlisting}

The second step is to compare the nodes of the current search. This means the current code AST node is compared against the current \texttt{applicable to} AST node. Based on this comparison, different steps must be performed, these steps can be seen below.

\begin{description}
    \item[NoMatch:] If a comparison between the nodes return a \texttt{NoMatch} result, we perform an early return of undefined, as no match was discovered.
    \item[Matched:] The current code node matches against the current node of the template, and we have to perform a search on each of the child nodes.
    \item[MatchedWithWildcard:] When a comparison results in a wildcard match, we immediately pair the current code node with the template wildcard and return early. This is possible because, once a wildcard matches, the child nodes are irrelevant and will be included in the transformation regardless.
    \item[MatchedWithPlussedWildcard:] This is a special case for a wildcard match. When a match occurs against a wildcard with a Kleene plus we do the same as \texttt{MatchedWithWildcard}, but give a different comparison result as this necessitates a special traversal of the current nodes siblings.
\end{description}

A comparison result of \texttt{Matched} means the two nodes match, but the \texttt{applicable to} node is not a wildcard. If this is the case, we perform a search on each child nodes of \texttt{applicable to} AST and the user AST. This can be seen in Listing~\ref{lst:pseudocodeChildSearch}.

When checking the child nodes, we have to check for a special case when the comparison of the child nodes result in \texttt{MatchedWithPlussedWildcard}. If this result is encountered, we have to continue matching the same \texttt{applicable to} node against each subsequent sibling node of the code node. This is because, a wildcard with a Kleene plus can match against multiple sibling nodes.

In the Listing~\ref{lst:pseudocodeChildSearch} below, we search the children of a comparison that returned the result \texttt{Match}. For this, we use a two pointer technique with \texttt{codeI} and \texttt{aplToI} (lines 1,2). This search continues until one of the pointers reaches the end of the list of children for its respective node (line 4). If any of the child nodes to not return a match the entire match is discarded (lines 8-10). We prepare the paired tree by appending the current child search to the parent pair (line 14,15). We handle the special case with a Kleene plus (line 18), by continuing the search with the same \texttt{aplToI} pointer while incrementing \texttt{codeI} (lines 19-22). As long as the result is \texttt{MatchedWithPlussedWildcard} we add the node matched with the wildcard to the pair of matches, meaning the pair will contain multiple nodes from the user AST matched with the same wildcard (line 28).  If the result is not \texttt{MatchedWithPlussedWildcard}, we decrement \texttt{codeI}, stop the comparisons against the wildcard, and continue searching all the child nodes as normal(lines 23-26). When one of the child lists is completely searched, we check if it is a full match of all the child nodes of the current code AST parent by verifying that we reached the end of the code AST children (lines 39-41). Once all these searches have been completed, and we confirm a \texttt{Match}, we return the paired tree structure along with match result.

\begin{lstlisting}[language={JavaScript}, caption={Pseudocode of child node matching}, label={lst:pseudocodeChildSearch}]
let codeI = 0;
let aplToI = 0;

while (aplToI < aplTo.children.length && codeI < code.children.length){
    let [pairedChild, childResult] = singleMatcher(code.children[codeI], aplTo.children[aplToI]);

    // If a child does not match, the entire match is discarded
    if(childResult === NoMatch){
        return [undefined, NoMatch];
    }


    // Add the match to the current Paired Tree structure
    pairedChild.parent = currentPair;
    currentPair.children.push(pairedChild);

    // Special case for Kleene plus wildcard match
    if(childResult === MatchedWithPlussedWildcard){
        codeI += 1;
        while(codeI < code.children.length){
            let [nextChild, plusChildResult] = singleMatcher(code.children[codeI], aplTo.children[aplToI]);

            if(plusChildResult !== MatchedWithPlussedWildcard){
                codeI -= 1;
                break;
            }

            pairedChild.element.codeNode.push(
                ...nextChild.element.codeNode);

            codeI += 1;
        }
    }

    codeI += 1;
    aplToi += 1;
}

if(codeI !== code.children.length){
    return [undefined, NoMatch]
}

return [currentPair, Match];
\end{lstlisting}

\subsection{Matching multiple statements}

Using multiple statements in the template of \texttt{applicable to} means the tree of \texttt{applicable to} as multiple root nodes, to perform a match with this kind of template, we use a sliding window~\cite{SlidingWindow} with size equal to the amount statements in the template. This window is applied at every \textit{BlockStatement} and \texttt{Program} of the code AST, as that is the only placement statements can reside in JavaScript~\cite[14]{ecma262}.

The initial step of this algorithm is to search through the AST for nodes that contain a list of \textit{Statements}. Searching the tree is done by depth-first search, at every level of the AST, we check the type of the node. Once a node of type \texttt{BlockStatement} or \texttt{Program} is encountered, we start the trying to match the statements.

\begin{lstlisting}[language={JavaScript}]
multiStatementMatcher(code, aplTo) {
    if (
        code.element.type === "Program" ||
        code.element.type === "BlockStatement"
    ) {
        matchMultiHead(code.children, aplTo.children);
    }

    for (let code_child of code.children) {
        multiStatementMatcher(code_child, aplTo);
    }
}

\end{lstlisting}

\texttt{matchMultiHead} uses a sliding window~\cite{SlidingWindow}. The sliding window will try to match every statement of the code AST against its corresponding statement in the \texttt{applicable to} AST. For every statement, we perform a DFS recursion algorithm is applied, similar to algorithm used in Section \ref{sec:singleMatcher}, however this search is not applied to all levels, and if it matches it has to match fully and immediately. If a match is not found, the current iteration of the sliding window is discarded and we move on to the next iteration by moving the window one further.

One important case here is we might not know the width of the sliding window, this is due to wildcards using the Kleene plus, as they can match one or more nodes against the wildcard. These wildcards might match against \texttt{(Statement)+}. Therefore, we use a similar technique to the one described in Section \ref{sec:singleMatcher}, where we have two pointers and perform a search based on the value of these pointers.

\subsection*{Output of the matcher}

The matches discovered have to be stored such that we can easily find all the nodes that were matched against wildcards and transfer them into the transformation later. To make this simpler, we make use an object  \texttt{PairedNodes}. This object allows us to easily find exactly what nodes were matched against each other. The matcher will place this object into the same tree structure described in \ref{sec:BabelParse}. This means the result of running the matcher on the user code is a list of \texttt{TreeNode<PairedNode>}.
\begin{lstlisting}[language={JavaScript}]
interface PairedNode{
    codeNode: t.Node[],
    aplToNode: t.Node
}
\end{lstlisting}

Since a match might be multiple statements, we use an interface \texttt{Match}, that contains separate tree structures of \texttt{PairedNodes}. This allows storage of a match with multiple root nodes. This is used by \texttt{matchMultiHead}.

\begin{lstlisting}[language={JavaScript}]
export interface Match {
    // Every matching Statement in order with each pair
    statements: TreeNode<PairedNodes>[];
}
\end{lstlisting}


\section{Transforming}
\label{sec:transform}
To perform the transformation and replacement on each of the matches, we take the resulting list of matches, the template from the \texttt{transform to}, and the Babel AST~\cite{BabelAST} version of original code. All the transformations are then applied to the code and we use \texttt{@babel/generate}~\cite{BabelGenerate} to generate JavaScript code from the transformed AST.

An important note is we have to ensure we transform the leaves of the AST first, this is because if the transformation was applied from top to bottom, it might remove transformations done using a previous match. This means if we transform from top to bottom on the tree, we might end up with \texttt{a(b) |> c(\%)} in stead of \texttt{b |> a(\%) |> c(\%)} in the case of the ``Pipeline'' proposal. This is quite easily solved in our case, as the matcher looks for matches from the top of the tree to the bottom of the tree, the matches it discovers are always in that order. Therefore when transforming, all that has to be done is reverse the list of matches, to get the ones closest to the leaves of the tree first.

\newpage
\subsubsection{Building the transformation}

Before we can start to insert the \texttt{transform to} section into the user's code AST. We have to insert all nodes matched against a wildcard in \texttt{applicable to} into their reference locations.

The first step to achieve this is to extract the wildcards from the match tree. This is done by recursively searching the match tree for an \texttt{Identifier} or \texttt{ExpressionStatement} containing an \texttt{Identifier}. To do this, we have a function \texttt{extractWildcardPairs}, which takes a single match, and extracts all wildcards and places them into a \texttt{Map<string, t.Node[]>}. Where the key of the map is the identifier used for the wildcard, and the value is the AST nodes the wildcard was matched against in the users code.

\begin{lstlisting}[language={JavaScript}, caption={Extracting wildcard from match}, label={lst:extractWildcardFromMatch}]
function extractWildcardPairs(match: Match): Map<string, t.Node[]> {
    let map: Map<string, t.Node[]> = new Map();

    function recursiveSearch(node: TreeNode<PairedNodes>) {
        let name: null | string = null;
        if (node.element.aplToNode.type === "Identifier") {
            name = node.element.aplToNode.name;
        } else if (
           // Node is ExpressionStatement with Identifier
        ) {
            name = node.element.aplToNode.expression.name;
        }

        if (name) {
            // Store in the map
            map.set(name, node.element.codeNode);
        }
        // Recursively search the child nodes
        for (let child of node.children) {
            recursiveSearch(child);
        }
    }
    // Start the initial search
    for (let stmt of match.statements) {
        recursiveSearch(stmt);
    }
    return map;
}
\end{lstlisting}

Once the full map of all wildcards has been built, we have to insert the node matched with the wildcard into the \texttt{transform to} template. To do this, we traverse the template with \texttt{@babel/traverse}~\cite{BabelTraverse}, as this provides us with a powerful API for modifying the AST. \texttt{@babel/traverse} allows us to define visitors, that are executed when traversing specific types of AST nodes. In this traversal, we define a visitor for \texttt{Identifier}, and a visitor for \texttt{ExpressionStatement}.

When we visit a node with these visitors, we check if that nodes name is in the map of wildcards. If the name of the identifier is a key in the wildcard map, we replace the identifier with the value in the map, which is AST nodes from the user's code that matched with that wildcard. See Listing \ref{lst:traToTransform}

\begin{lstlisting}[language={JavaScript}, caption={Traversing \texttt{transform to} AST and inserting user context}, label={lst:traToTransform}]
traverse(transformTo, {
        Identifier: (path) => {
            if (wildcardMatches.has(path.node.name)) {
                let toReplaceWith = wildcardMatches.get(path.node.name);
                if (toReplaceWith) {
                    path.replaceWithMultiple(toReplaceWith);
                }
            }
        },
        ExpressionStatement: (path) => {
            if (path.node.expression.type === "Identifier") {
                let name = path.node.expression.name;
                if (wildcardMatches.has(name)) {
                    let toReplaceWith = wildcardMatches.get(name);
                    if (toReplaceWith) {
                        path.replaceWithMultiple(toReplaceWith);
                    }
                }
            }
        },
    });
\end{lstlisting}

Due to some wildcards allowing matching of multiple sibling nodes, we have to use \texttt{replaceWithMultiple} when performing the replacement. This can be seen on line 6 and 16 of Listing \ref{lst:traToTransform}.

\subsubsection*{Inserting the template into the AST}

We have now created the \texttt{transform to} template with the user's context. This has to be inserted into the full AST definition of the users code. To do this we have to locate exactly where in the user AST this match originated. To perform this efficiently, we use this top node as the key to a \texttt{Map}, so if a node in the user AST exists in that map, we know it was matched and should be replaced.

\begin{lstlisting}[language={JavaScript}]
transformedTransformTo.set(
    match.statements[0].element.codeNode[0],
    [
        transformMatchFaster(wildcardMatches, traToWithWildcards),
        match,
    ]
);
\end{lstlisting}


We now traverse the AST generated from the users code with \texttt{@babel/traverse}. In this case we cannot use a specific visitor, and therefore we use a generic visitor that applies to every node type of the AST. If the current node we are visiting is a key to the map of transformations created earlier, we know we have to insert the transformed code. This is done similarly to before where we use \texttt{replaceWithMultiple}.

Some matches have multiple root nodes. This is likely when matching was done with multiple statements as top nodes. This means we have to remove n-1 following sibling nodes. Removal of these sibling nodes can be seen on lines 12-15 of Listing \ref{lst:insertingIntoUserCode}.

\begin{lstlisting}[language={JavaScript}, caption={Inserting transformed matches into user code}, label={lst:insertingIntoUserCode}]
traverse(codeAST, {
        enter(path) {
            if (transformedTransformTo.has(path.node)) {
                let [traToWithWildcards, match] =
                    transformedTransformTo.get(path.node) as [
                        t.File,
                        Match
                    ];
                path.replaceWithMultiple(
                    traToWithWildcards.program.body);

                let siblings = path.getAllNextSiblings();

                // For multi line applicable to
                for (let i = 0; i < match.statements.length - 1; i++) {
                    siblings[i].remove();
                }

                // When we have matched top statements with +, we might have to remove more siblings
                for (let matchStmt of match.statements) {
                    for (let codeStmt of matchStmt.element
                        .codeNode) {
                        let siblingnodes = siblings.map((a) => a.node);
                        if (siblingnodes.includes(codeStmt)) {
                            let index = siblingnodes.indexOf(codeStmt);
                            siblings[index].remove();
                        }
                    }
                }
            }
        },
    });
\end{lstlisting}

There is a special case when a wildcard with a Kleene plus, allowing the match of multiple siblings, means we might have more siblings to remove. In this case, it is not so simple to know exactly how many we have to remove. Therefore, we have to iterate over all statements of the match, and check if that statement is still a sibling of the current one being replace. This behavior can be seen on lines 20-29 of Listing \ref{lst:insertingIntoUserCode}.

After one full traversal of the user AST. All matches found have been replaced with their respective transformation. All that remains is generating JavaScript from the transformed AST.

\subsection{Generating source code from transformed AST}
\label{sec:generate}
To generate JavaScript from the transformed AST created by this tool, we use a JavaScript library titled \texttt{@babel/generator}~\cite{BabelGenerate}. This library is specifically designed for use with Babel to generate JavaScript from a Babel AST. The transformed AST definition of the users code is transformed, while being careful to apply all Babel plugins the current proposal might require.