master/chapter/ch4.tex

\chapter{Implementation}

In this chapter, the implementation of the tool utilizing the \DSL and \DSLSH will be presented. It will describe the overall architecture of the tool, the flow of data throughout, and how the different stages of transforming user code are completed.

\section{Architecture}

The architecture of the work described in this thesis is illustrated in \figFull[fig:architecture]

In this tool, there exists two multiple ways to define a proposal, and each provide the same functionality, they only differ in syntax and writing-method. One can either write the definition in \DSL which utilizes Langium to parse the language, or one can use a JSON definition, which is more friendly as an API or people more familiar with JSON definitions.

\begin{figure}
\begin{center}
\begin{tikzpicture}[
    roundnode/.style={ellipse, draw=red!60, fill=red!5, very thick, minimum size=7mm},
    squarednode/.style={rectangle, draw=red!60, fill=red!5, very thick, minimum size=5mm}
]
\node[roundnode]   (jstqlcode)        {JSTQL Code};
\node[roundnode]   (selfhostedjsoninput)    [right=of jstqlcode]                        {Self-Hosted JSON};

\node[squarednode] (langium)        [below=of jstqlcode]            {Langium Parser};
\node[squarednode] (jsonparser)     [below=of selfhostedjsoninput]  {Self-Hosted JSON parser};
\node[squarednode] (preludebuilder) [below=of jsonparser]           {Prelude Builder};
\node[squarednode] (preParser)      [below=of langium]              {Pre-parser};
\node[squarednode] (babel)          [below right=of preParser]      {Babel};
\node[squarednode] (treebuilder)    [below=of babel]                {Custom AST builder};
\node[squarednode] (matcher)        [below=of treebuilder]          {Matcher};
\node[squarednode] (transformer)    [below=of matcher]              {Transformer};
\node[squarednode] (joiner)         [below=of transformer]          {Generator};


\draw[->] (jstqlcode.south) -- (langium.north);
\draw[->] (langium.south) -- (preParser.north);
\draw[->] (preParser.south) |- (babel.west);
\draw[->] (babel.south) -- (treebuilder.north);
\draw[->] (treebuilder.south) -- (matcher.north);
\draw[->] (matcher.south) -- (transformer.north);
\draw[->] (transformer.south) -- (joiner.north);
\draw[->] (selfhostedjsoninput.south) -- (jsonparser.north);
\draw[->] (jsonparser.south) -- (preludebuilder.north);
\draw[->] (preludebuilder.south) |- (babel.east);

\end{tikzpicture}
\end{center}

\caption[Tool architecture]{Overview of tool architecture}
\label{fig:architecture}
\end{figure}


\section{Parsing \DSL using Langium}

In this section, the implementation of the parser for \DSL will be described. This section will outline the tool Langium, used as a parser-generator to create the AST used by the tool later to perform the transformations.

\subsection{Langium}

Langium \cite{Langium} is primarily used to create parsers for Domain Specific Language, these kinds of parsers output an Abstract Syntax Tree that is later used to create interpreters or other tooling. In the case of \DSL we use Langium to generate an AST definition in the form of TypeScript Objects, these objects and their relation are used as definitions for the tool to do matching and transformation of user code.

In order to generate this parser, Langium required a definition of a Grammar. A grammar is a set of instructions that describe a valid program. In our case this is a definition of describing a proposal, and its applicable to, transform to, descriptions. A grammar in Langium starts by describing the \texttt{Model}. The model is the top entry of the grammar, this is where the description of all valid top level statements.

In \DSL the only valid top level statement is the definition of a proposal. This means our language grammar model contains only one list, which is a list of 0 or many \texttt{Proposal} definitions. A Proposal definition is denoted by a block, which is denoted by \texttt{\{...\}} containing some valid definition. In the case of \DSL this block contains 1 or many definitions of \texttt{Case}.

\texttt{Case} is defined very similarly to \texttt{Proposal}, as it contains only a block containing a definition of a \texttt{Section}

The \texttt{Section} is where a single case of some applicable code and its corresponding transformation is defined. This definition contains specific keywords do describe each of them, \texttt{applicable to} denotes a definition of some template \DSL uses to perform the matching algorithm. \texttt{transform to} contains the definition of code used to perform the transformation.

In order to define exactly what characters/tokens are legal in a specific definition, Langium uses terminals defined using Regular Expressions, these allow for a very specific character-set to be legal in specific keys of the AST generated by the parser generated by Langium. In the definition of \texttt{Proposal} and \texttt{Pair} the terminal \texttt{ID} is used, this terminal is limited to allow for only words and can only begin with a character of the alphabet or an underscore. In \texttt{Section} the terminal \texttt{TEXT} is used, this terminal is meant to allow any valid JavaScript code and the custom DSL language described in \ref{sec:DSL_DEF}. Both these terminals defined allows Langium to determine exactly what characters are legal in each location.

\begin{lstlisting}[caption={Definition of \DSL in Langium}, label={def:JSTQLLangium}]
grammar Jstql

entry Model:
    (proposals+=Proposal)*;

Proposal:
    'proposal' name=ID "{"
        (case+=Case)+
    "}";

Case:
    "case" name=ID "{"
        aplTo=ApplicableTo
        traTo=TraTo
    "}";

ApplicableTo:
    "applicable" "to" "{"
        apl_to_code=STRING
    "}";
TraTo:
    "transform" "to" "{"
        transform_to_code=STRING
    "}";
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
terminal STRING: /"[^"]*"|'[^']*'/;
\end{lstlisting}

In the case of \DSL, we are not actually implementing a programming language meant to be executed. We are using Langium in order to generate an AST that will be used as a markup language, similar to YAML, JSON or TOML. The main reason for using Langium in such an unconventional way is Langium provides support for Visual Studio Code integration, and it solves the issue of parsing the definition of each proposal manually. However with only the grammar we cannot actually verify the wildcards placed in \texttt{apl\_to\_code} and \texttt{transform\_to\_code} are correctly written. This is done by using a feature of Langium called \texttt{Validator}.


\subsection*{Langium Validator}

A Langium validator allows for further checks on the templates written withing \DSL, a validator allows for the implementation of specific checks on specific parts of the grammar.

\DSL does not allow empty typed wildcard definitions in \texttt{applicable to}, this means a wildcard cannot be untyped or allow any AST type to match against it. This is not possible to verify with the grammar, as inside the grammar the code is simply defined as a \texttt{STRING} terminal. This means further checks have to be implemented using code. In order to do this we have a specific \texttt{Validator} implemented on the \texttt{Pair} definition of the grammar. This means every time anything contained within a \texttt{Pair} is updated, the language server shipped with Langium will perform the validation step and report any errors.

The validator uses \texttt{Pair} as it's entry point, as it allows for a checking of wildcards in both \texttt{applicable to} and \texttt{transform to}, allowing for a check for if a wildcard identifier used in \texttt{transform to} exists in the definition of \texttt{applicable to}.

\begin{lstlisting}[language={JavaScript}]
export class JstqlValidator {
    validateWildcardAplTo(pair: Pair, accept: ValidationAcceptor): void {
        try {
            if (validationResultAplTo.errors.length != 0) {
                accept("error", validationResultAplTo.errors.join("\n"), {
                    node: pair.aplTo,
                    property: "apl_to_code",
                });
            }
            if (validationResultTraTo.length != 0) {
                accept("error", validationResultTraTo.join("\n"), {
                    node: pair.traTo,
                    property: "transform_to_code",
                });
            }
        } catch (e) {}
    }
}

\end{lstlisting}

\subsection*{Using Langium as a parser}

\cite{Langium}{Langium} is designed to automatically generate a lot of tooling for the language specified using its grammar. However, in our case we have to parse the \DSL definition using Langium, and then extract the Abstract syntax tree generated in order to use the information it contains.

To use the parser generated by Langium, we created a custom function \texttt{parseDSLtoAST()} within our Langium project, this function takes a string as an input, the raw \DSL code, and outputs the pure AST using the format described in the grammar described in \figFull[def:JSTQLLangium]. This function is exposed as a custom API for our tool to interface with. This also means our tool is dependent on the implementation of the Langium parser to function with \DSL. The implementation of \DSLSH is entirely independent.

When interfacing with the Langium parser to get the Langium generated AST, the exposed API function is imported into the tool, when this API is ran, the output is on the form of the Langium \textit{Model}, which follows the same form as the grammar. This is then transformed into an internal object structure used by the tool, this structure is called \texttt{TransformRecipe}, and is then passed in to perform the actual transformation.

\section{Pre-parsing}

In order to refer to internal DSL variables defined in \texttt{applicable to} in the transformation, we need to extract this information from the template definitions and pass that on to the matcher.

\subsection*{Why not use Langium?}

Langium has support for creating a generator for generating an artifact, this actually suits the needs of \DSL quite well and could be used to extract the wildcards from each \texttt{pair} and create the \texttt{TransformRecipe}. This would, as a consequence, make \DSLSH not be entirely independent, and the entire tool would rely on Langium. This is not preferred as that would mean both ways of defining a proposal both are reliant of Langium and not separated. The reason for using our own pre-parser is to allow for an independent way to define transformations using our tool.

\subsection*{Extracting wildcards from \DSL}

In order to allow the use of \cite[Babel]{Babel}, the wildcards present in the blocks of \texttt{applicable to} and \texttt{transform to} have to be parsed and replaced with some valid JavaScript. This is done by using a pre-parser that extracts the information from the wildcards and inserts an \texttt{Identifier} in their place.

 To pre-parse the text, we look at each and every character in the code section, when a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used, so when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to just pass the character through to the variable \texttt{cleanedJS}. When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, when we have consumed the entirety of the wildcard, it is then passed to a tokenizer, then to a recursive descent parser.

\begin{lstlisting}[language={JavaScript}]
export function parseInternal(code: string): InternalParseResult {
    let cleanedJS = "";
    let temp = "";
    let flag = false;
    let prelude: InternalDSLVariable = {};

    for (let i = 0; i < code.length; i++) {
        if (code[i] === "<" && code[i + 1] === "<") {
            // From now in we are inside of the DSL custom block
            flag = true;
            i += 1;
            continue;
        }

        if (flag && code[i] === ">" && code[i + 1] === ">") {
            // We encountered a closing tag
            flag = false;

            let { identifier, types } = parseInternalString(temp);

            cleanedJS += identifier;

            prelude[identifier] = types;
            i += 1;
            temp = "";
            continue;
        }

        if (flag) {
            temp += code[i];
        } else {
            cleanedJS += code[i];
        }
    }
    return { prelude, cleanedJS };
}
\end{lstlisting}

\subsection*{Parsing wildcard}

Once a wildcard has been extracted from the \texttt{pair} definitions inside \DSL, they have to be parsed into a simple Tree to be used when matching against the wildcard. This is accomplished by using a simple tokenizer and a \cite{RecursiveDescent}{Recursive Descent Parser}.

Our tokenizer simply takes the raw stream of input characters extracted from the wildcard block within the template, and determines which part is what token. Due to the very simple nature of the type expressions, no ambiguity is present with the tokens, so determining what token is meant to come at what time is quite trivial. The tokenizer \textbf{I need to figure out what kind of tokenization algorithm i am actually using LOL}

A recursive descent parser is created to closely mimic the grammar of the language the parser is implemented for, where we define functions for handling each of the non-terminals and ways to determine what non terminal each of the tokens result in. In the case of this parser, the language is a very simple boolean expression language. We use boolean combinatorics to determine whether or not a specific AST nodetype of a \cite{BabelParser}{Babel parser} AST node is a match against a specific wildcard. This means we have to create a very simple AST that can be evaluated using the AST nodetype as an input.

\begin{lstlisting}[caption={Grammar of type expressions}, label={ex:grammarTypeExpr}]
Wildcard:
    Identifier ":" MultipleMatch

MultipleMatch:
    GroupExpr "*"
    | TypeExpr

TypeExpr:
    BinaryExpr
    | UnaryExpr
   | PrimitiveExpr

BinaryExpr:
    TypeExpr { Operator TypeExpr }*

UnaryExpr:
    {UnaryOperator}? TypeExpr

PrimitiveExpr:
    GroupExpr | Identifier

GroupExpr:
    "(" TypeExpr ")"
\end{lstlisting}

The grammar of the type expressions used by the wildcards can be seen in \figFull[ex:grammarTypeExpr], the grammar is written in something similar to Extended Backus-Naur form, where we define the terminals and non-terminals in a way that makes the entire grammar \textit{solvable} by the Recursive Descent parser.

Our recursive descent parser produces a very simple \cite{AST1,AST2}{AST} which is later used to determine when a wildcard can be matched against a specific AST node, the full definiton of this AST can be seen in \ref*{ex:typeExpressionTypes}. We use this AST by traversing it using a \cite{VisitorPattern}{visitor pattern} and comparing each \textit{Identifier} against the specific AST node we are currently checking, and evaluating all subsequent expressions and producing a boolean value, if this value is true, the node is matched against the wildcard, if not then we do not have a match.


\subsection*{Pre-parsing \DSLSH}

The self-hosted version \DSLSH also requires some form of pre-parsing in order to prepare the internal DSL environment. This is relatively minor and only parsing directly with no insertion compared to \DSL.

In order to use JavaScript as the meta language to define JavaScript we define a \texttt{Prelude}. This prelude is required to consist of several \texttt{Declaration Statements} where the variable names are used as the internal DSL variables and right side expressions are used as the DSL types. In order to allow for multiple types to be allowed for a single internal DSL variable we re-use JavaScripts list definition.

We use Babel to generate the AST of the \texttt{prelude} definition, this allows us to get a JavaScript object structure. Since the structure is very strictly defined, we can expect every \texttt{stmt} of \texttt{stmts} to be a variable declaration, otherwise throw an error for invalid prelude. Continuing through the object we have to determine if the prelude definition supports multiple types, that is if it is either an \texttt{ArrayDeclaration} or just an \texttt{Identifier}. If it is an array we initialize the prelude with the name field of the \texttt{VariableDeclaration} to either an empty array and fill it with each element of the ArrayDeclaration or directly insert the single Identifier.

\section{Using Babel to parse}
\label{sec:BabelParse}

Allowing the tool to perform transformations of code requires the generation of an Abstract Syntax Tree from the users code, \texttt{applicable to} and \texttt{transform to}. This means parsing JavaScript into an AST, in order to do this we use a tool \cite[Babel]{Babel}.

The most important reason for choosing to use Babel for the purpose of generating the AST's used for transformation is due to the JavaScript community surrounding Babel. As this tool is dealing with proposals before they are part of JavaScript, a parser that supports early proposals for JavaScript is required. Babel supports most Stage 2 proposals through its plugin system, which allows the parsing of code not yet part of the language.


\subsection*{Custom Tree Structure}

To allow for matching and transformations to be applied to each of the sections inside a \texttt{pair} definition, they have to be parsed into and AST in order to allow the tool to match and transform accordingly. To do this the tool uses the library \cite[Babel]{Babel} to generate an AST data structure. However, this structure does not suit traversing multiple trees at the same time, this is a requirement for matching and transforming. Therefore we use this Babel AST and transform it into a simple custom tree structure to allow for simple traversal of the tree.

As can be seen in \figFull[def:TreeStructure] we use a recursive definition of a \texttt{TreeNode} where a nodes parent either exists or is null (it is top of tree), and a node can have any number of children elements. This definition allows for simple traversal both up and down the tree. Which means traversing two trees at the same time can be done in the matcher and transformer section of the tool.


\begin{lstlisting}[language={JavaScript}, label={def:TreeStructure}, caption={Simple definition of a Tree structure in TypeScript}]
export class TreeNode<T> {
    public parent: TreeNode<T> | null;
    public element: T;
    public children: TreeNode<T>[] = [];

    constructor(parent: TreeNode<T> | null, element: T) {
        this.parent = parent;
        this.element = element;
        if (this.parent) this.parent.children.push(this);
    }
}
\end{lstlisting}

Placing the AST generated by Babel into this structure means utilizing the library \cite{BabelTraverse}{Babel Traverse}. Babel Traverse uses the \cite{VisitorPattern}{visitor pattern} to allow for traversal of the AST. While this method does not suit traversing multiple trees at the same time, it allows for very simple traversal of the tree in order to  place it into our simple tree structure.

\cite{BabelTraverse}{Babel Traverse} uses the \cite{VisitorPattern}{visitor pattern} to visit each node of the AST in a \textit{depth first} manner, the idea of this pattern is one implements a \textit{visitor} for each of the nodes in the AST and when a specific node is visited, that visitor is then used. In the case of transferring the AST into our simple tree structure we simply have to use the same visitor for all nodes, and place that node into the tree.

Visiting a node using the \texttt{enter()} function means we went from the parent to that child node, and it should be added as a child node of the parent. The node is automatically added to its parent list of children nodes from the constructor of \texttt{TreeNode}. Whenever leaving a node the function \texttt{exit()} is called, this means we are moving back up into the tree, and we have to update what node was the \textit{last} in order to generate the correct tree structure.

\begin{lstlisting}[language={JavaScript}]
traverse(ast, {
        enter(path: any) {
            let node: TreeNode<t.Node> = new TreeNode<t.Node>(
                last,
                path.node as t.Node
            );

            if (last == null) {
                first = node;
            }
            last = node;
        },
        exit(path: any) {
            if (last && last?.element?.type != "Program") {
                last = last.parent;
            }
        },
    });
    if (first != null) {
        return first;
    }

\end{lstlisting}


\section{Matching}


Performing the match against the users code it the most important step, as if no matching code is found the tool will do no transformations. Finding the matches will depend entirely on how well the definition of the proposal is written, and how well the proposal actually can be defined within the confines of \DSL. In this chapter we will discuss how matching is performed based on the definition of \texttt{applicable to}

\subsection*{Determining if AST nodes match}

The initial problem we have to overcome is a way of comparing AST nodes from the template to AST nodes from the user code. This step also has to take into account comparing against wildcards and pass that information back to the AST matching algorithms.


In the pre-parsing step of \DSL we are replacing each of the wildcards with an expression of type Identifier, this means we are inserting an Identifier at either a location where an expression resides, or a statement. In the case of the identifier being placed where a statement should reside, it will be wrapped in an ExpressionStatement. This has to be taken into account when comparing statement nodes from the template and user code, as if we encounter an ExpressionStatement, its corresponding expression has to be checked for if it is an Identifier.

Since a wildcard is replaced by an Identifier, when matching a node in the template, we have to check if it is the \textit{Identifier} or \textit{ExpressionStatement} with an identifier contained within, if there is an identifier, we have to check if that identifier is a registered wildcard. If an Identifier shares a name with a wildcard, we have to compare the node against the Type expression of that wildcard. When we do this, we traverse the entirety of the wildcard expression AST and compare each of the leaves against the type of the current code node. These resulting values are then passed through the type expression and the resulting value is whether or not that code node can be matched against the wildcard. We differentiate between if a node matched against a wildcard with the \texttt{+} notation, as if that is the case we have to keep using that wildcard until it returns false in the tree exploration algorithms.

When we are either matching against an Identifier that is not a registered wildcard, or any other AST node in the template, we have to perform an equality check, in the case of this template language, we can get away with just performing some preliminary checks, such as that names of Identifiers are the same. Otherwise it is sufficient to just perform an equality check of the types of the nodes we are currently trying to match. If the types are the same, they can be validly matched against each other. This is sufficient because we are currently trying to determine if a single node can be a match, and not the entire template structure is a match. Therefore false positives that are not equivalent are highly unlikely due to the fact the entire structure has to be a false positive match.

The function used for matching singular nodes will give different return values based on how they were matched. The results NoMatch and Matched are self explanatory, they are used when either no match is found, or if the nodes types match and the template node is not a wildcard. When we are matching against a wildcard, if it is a simple wildcard that cannot match against multiple nodes of the code, the result will be \texttt{MatchedWithWildcard}. If the wildcard used to match is a one or many wildcard, the result will be \texttt{MatchedWithPlussedWildcard}, as this shows the recursive traversal algorithm used that this node of the template have to be tried against the code nodes sibling.

\begin{lstlisting}
enum MatchResult {
    MatchedWithWildcard,
    MatchedWithPlussedWildcard,
    Matched,
    NoMatch,
}
\end{lstlisting}

\subsection*{Matching a singular Expression/Statement template}

The method of writing the \texttt{applicable to} section using a singular simple expression/statement is by far the most versatile way of defining matching template, this is because there will be a higher probability of discovering applicable code with a template that is as generic and simple as possible. A very complex matching template with many statements or an expression containing many AST nodes will result in a lower chance of finding a resulting match in the users code. Therefore using simple, single root node matching templates provide the highest possibility of discovering a match within the users code.

Determining if we are currently trying to match with a template that is only a single expression/statement, we have to verify that the program body of the template has the length of 1, if it does we can use the singular expression matcher, if not, we have to rely on the matcher that can handle multiple statements at the head of the tree.

When matching an expression the first statement in the program body of the AST generated when using \cite{BabelGenerate}{babel generate} will be of type \texttt{ExpressionStatement}, the reason for this is Babel will treat free floating expressions as a statement, and place them into an ExpressionStatement. This will miss many applicable sections in the case of trying to match against a users code because expressions within other statements are not inside an ExpressionStatement. This will give a template that is incompatible with a lot of otherwise applicable expressions. This means the statement ExpressionStatement has to be removed, and the search has to be done with the expression as the top node of the template.

In the case of the singular node in the body of the template program being a Statement, no removal has to be done, as a Statement can be used directly.

\subsubsection*{Recursively discovering matches}

The matcher used against single Expression/Statement templates is based upon a Depth-First Search in order to perform matching, and searches for matches from the top of the code definition. It is important we try to match against the template at all levels of the code AST, this is done by starting a new search one every child node of the code AST if the current node of the template tree is the top node of the template. This ensures we have tried to perform a match at any level of the tree, this also means we do not get any partial matches, as we only store matches that are returned at the recursive call when we do the search from the first node of the template tree.
This is all done before ever checking the node we are currently on. The reason for this is to avoid missing matches that reside further down in the current branch, and also ensure matches further down are placed earlier in the full match array, which makes it easier to perform transformation when partial collisions exist.

Once we have started a search on all the child nodes of the current one using the full definition of \texttt{applicable to}, we can verify if we are currently exploring a match. This means the current node is checked against the current top node of \texttt{applicable to}, if said node is a match, based on what kind of match it is several different parts of the algorithm are called. This is because there are different forms of matches depending on if it is a match against a wildcard, a wildcard with \texttt{+}, or simply a node type match.

If the current node matches against a wildcard that does not use the \texttt{+} operator, we simply pair the current template node to the matched node from the users code and return. This is because whatever the current user node contains, it is being matched against a wildcard and that means no matter what is below it, it is meant to be placed directly into the transformation. Therefore we can determine that this is a match that is valid.

When the current node is matched against a wildcard that does use the \texttt{+} operator, we have to continue trying to match against that same wildcard with the sibling nodes of the current code node. This is performed in the recursive iteration above the current one, and therefore we also return the paired AST nodes of the template and the code, but we give the match result \texttt{MatchResult.MatchedWithPlussedWildcard} to the caller function. When the caller function gets this result, it will continue trying to match against the wildcard until it receives a different match result other than \texttt{MatchResult.MatchedWithPlussedWildcard}.

When the current node is matched based on the types of the current AST nodes, some parts have to hold. Namely, all child nodes of the template and the user code have to also return some form of match, this means if any of the child nodes currently return \texttt{MatchResult.NoMatch} the entire match is discarded. The number of child nodes of the current match also has to be equal. Due to wildcards this means we have to be able to match all child nodes of the user code to either a single node of the template, or a wildcard using the \texttt{+} operator.

If the current node does not match, we simply discard the current search, as we have already started a search from the start of the template at all levels of the user code AST, we can safely end the search and rely on these to find matches further down in the tree.

To allow for easier transformation, and storage of what exact part of \texttt{applicable to} was matched against the exact node of the code AST, we use a custom instance of the simple tree structure described in \ref*{sec:BabelParse}, we use an interface \texttt{PairedNode}, this allows us to hold what exact nodes were matched together, this allows for a simpler transforming algorithm. The exact definition of \texttt{PairedNode} can be seen below. The reason the codeNode is a list, is due to wildcards allowing for multiple AST nodes to match against, as they might match multiple nodes of the user code against a single node of the template.
\begin{lstlisting}[language={JavaScript}]
interface PairedNode{
    codeNode: t.Node[],
    aplToNode: t.Node
}
\end{lstlisting}


\subsection*{Matching multiple Statements}

Using multiple statements in the template of \texttt{applicable to} will result in a much stricter matcher, that will only try to perform an exact match using a \cite{SlidingWindow}{sliding window} of the amount of statements at every \textit{BlockStatement}, as that is the only placement Statements can reside in JavaScript\cite{ECMA262Statement}.

The initial step of this algorithm is to search through the AST for ast nodes that contain a list of \textit{Statements}, this can be done by searching for the AST nodes \textit{Program} and \textit{BlockStatement}, as these are the only valid places for a list of Statements to reside \cite{ECMA262Statement}. Searching the tree is quite simple, as all that is required is checking the type of every node recursively, and once a node that can contain multiple Statements, we check it for matches.

Once a list of \textit{Statements} has been discovered, the function \texttt{matchMultiHead} can be executed with that block and the Statements of \texttt{applicable to}.
This function will use the technique \cite{SlidingWindow}{sliding window} to match multiple statements in order the same length as the list of statements are in \texttt{applicable to}. This sliding window will try to match every Statement against its corresponding Statement in the current \textit{BlockStatement}. When matching a singular Statements in the sliding window, a simple DFS recursion algorithm is applied, similar to algorithm used for matching a single expression/statement template, however the difference is that we do not search the entire AST tree, and if it matches it has to match fully and immediately. If a match is not found, the current iteration of the sliding window is discarded and we move on to the next iteration by moving the window one further.

One important case here is we might not know the width of the sliding window, this is due to wildcards using the \texttt{+}, as they can match one or more nodes against each other. These wildcards might match against \texttt{(Statement)+}. Therefore, we have to use a two point technique when iterating through the statements of the users code. As we might have to use the same statement from the template multiple times.

\subsection*{Output of the matcher}

The resulting output of the matcher after finding all available matches, is a two dimensional array of each match, where for every match there is a list of statements in AST form, where paired ASTs from \texttt{applicable to} and the users code can be found. This means that for every match, we might be transforming and replacing multiple statements in the transformation function.

\begin{lstlisting}[language={JavaScript}]
export interface Match {
    // Every matching Statement in order with each pair
    statements: TreeNode<PairedNodes>[];
}
\end{lstlisting}


\section{Transforming}

To perform the transformation and replacement on each of the matches, we take the resulting list of matches, the template from the \texttt{transform to} section of the current case of the proposal, and the AST version of original code parsed by Babel. All the transformations are then applied to the code and we use \cite{BabelGenerate}{Babel generate} to generate JavaScript code from the transformed AST.

An important discovery is to ensure we transform the leaves of the AST first, this is because if the transformation was applied from top to bottom, it might remove transformations done using a previous match. This means if we transform from top to bottom on the tree, we might end up with \texttt{a(b) |> c(\%)} in stead of \texttt{b |> a(\%) |> c(\%)} in the case of the pipeline proposal. This is quite easily solved in our case, as the matcher looks for matches from the top of the tree to the bottom of the tree, the matches it discovers are always in that order. Therefore when transforming, all that has to be done is reverse the list of matches, to get the ones closest to the leaves of the tree first.

\subsubsection*{Preparing the transform to template}

The transformations are performed by inserting the matched wildcards from the applicable to template into their respective locations in the transform to template. Then the entire transform to template is placed into the original code AST where the match was discovered. Doing this we are essentially doing a transformation that is a find and replace with context passed through the wildcards.

In order to perform the transformation, all the sections matched against a wildcard have to be transferred into the \texttt{transform to} template. We utilize the functionality from Babel here and traverse the generated AST of the transform to template using \cite{BabelTraverse}{Babel traverse}, as this gives ut utility functions to replace, replace with many, and remove nodes of the AST. We use custom visitors for \textit{Identifier} and \textit{ExpressionStatement} with an Identifier as expression, in order to determine where the wildcard matches have to be placed, as they have to placed at the same location that shares a name with the wildcard. Once a shared identifier between the \texttt{transform to} template and the \texttt{applicable to} template is discovered, a babel traverse replace with multiple is performed and the node/s found in the match is inserted in place of the wildcard.

\subsubsection*{Inserting the template into the AST}

Having a transformed version of the users code, it has to be inserted into the full AST definition of the users code, again we use \cite{BabelTraverse}{babel/traverse} to traverse the entirety of the code AST using a visitor. This visitor does not apply to any node-type, as the matched section can be any type. Therefore we use a generic visitor, and use an equality check to find the exact part of the code this specific match comes from. Once we find where in the users code the match came from, we replace it with the transformed \texttt{transform to} nodes. This might be multiple Statements, therefore the function \texttt{replaceWithMultiple} is used, to insert every Statement from the \texttt{transform to} body, and we are careful to remove any following sibling nodes that were part of the original match. This is done by removing the \textit{n-1} next siblings from where we inserted the transform to template.

To generate JavaScript from the transformed AST created by this tool, we use a JavaScript library titled \cite{BabelGenerate}{babel/generator}. This library is specifically designed for use with Babel to generate JavaScript from a Babel AST. The transformed AST definition of the users code is transformed, while being careful to apply all Babel plugins the current proposal might require.