master/chapter/ch4.tex

\chapter{Implementation}

In this chapter, the implementation of the tool utilizing the \DSL and \DSLSH will be presented. It will describe the overall architecture of the tool, the flow of data throughout, and how the different stages of transforming user code are completed.

\section{Architecture}

The architecture of the work described in this thesis is illustrated in \figFull[fig:architecture]

In this tool, there exists two multiple ways to define a proposal, and each provide the same functionality, they only differ in syntax and writing-method.

\begin{figure}
\begin{center}
\begin{tikzpicture}[
    roundnode/.style={ellipse, draw=red!60, fill=red!5, very thick, minimum size=7mm},
    squarednode/.style={rectangle, draw=red!60, fill=red!5, very thick, minimum size=5mm}
]
\node[roundnode]   (jstqlcode)        {JSTQL Code};
\node[roundnode]   (selfhostedjsoninput)    [right=of jstqlcode]                        {Self-Hosted JSON};

\node[squarednode] (langium)        [below=of jstqlcode]            {Langium Parser};
\node[squarednode] (jsonparser)     [below=of selfhostedjsoninput]  {Self-Hosted JSON parser};
\node[squarednode] (preludebuilder) [below=of jsonparser]           {Prelude Builder};
\node[squarednode] (preParser)      [below=of langium]              {Pre-parser};
\node[squarednode] (babel)          [below right=of preParser]      {Babel};
\node[squarednode] (treebuilder)    [below=of babel]                {Custom AST builder};
\node[squarednode] (matcher)        [below=of treebuilder]          {Matcher};
\node[squarednode] (transformer)    [below=of matcher]              {Transformer};
\node[squarednode] (joiner)         [below=of transformer]          {Joiner};


\draw[->] (jstqlcode.south) -- (langium.north);
\draw[->] (langium.south) -- (preParser.north);
\draw[->] (preParser.south) |- (babel.west);
\draw[->] (babel.south) -- (treebuilder.north);
\draw[->] (treebuilder.south) -- (matcher.north);
\draw[->] (matcher.south) -- (transformer.north);
\draw[->] (transformer.south) -- (joiner.north);
\draw[->] (selfhostedjsoninput.south) -- (jsonparser.north);
\draw[->] (jsonparser.south) -- (preludebuilder.north);
\draw[->] (preludebuilder.south) |- (babel.east);

\end{tikzpicture}
\end{center}

\caption[Tool architecture]{Overview of tool architecture}
\label{fig:architecture}
\end{figure}
\section{Parsing \DSL using Langium}

In this section, the implementation of the parser for \DSL will be described. This section will outline the tool Langium, used as a parser-generator to create the AST used by the tool later to perform the transformations.

\subsection{Langium}

Langium \cite{Langium} is primarily used to create parsers for Domain Specific Language, these kinds of parsers output an Abstract Syntax Tree that is later used to create interpreters or other tooling. In the case of \DSL we use Langium to generate TypeScript Objects that are later used as definitions for the tool to do matching and transformation of user code.

In order to generate this parser, Langium required a definition of a Grammar. A grammar is a set of instructions that describe a valid program. In our case this is a definition of describing a proposal, and its applicable to, transform to, descriptions. A grammar in Langium starts by describing the \texttt{Model}. The model is the top entry of the grammar, this is where the description of all valid top level statements.

In \DSL the only valid top level statement is the definition of a proposal. This means our language grammar model contains only one list, which is a list of 0 or many \texttt{Proposal} definitions. A Proposal definition is denoted by a block, which is denoted by \texttt{\{...\}} containing some valid definition. In the case of \DSL this block contains 1 or many definitions of \texttt{Pair}.

\texttt{Pair} is defined very similarly to \texttt{Proposal}, as it contains only a block containing a definition of a \texttt{Section}

The \texttt{Section} is where a single case of some applicable code and its corresponding transformation is defined. This definition contains specific keywords do describe each of them, \texttt{applicable to} denotes a definition of some template \DSL uses to perform the matching algorithm. \texttt{transform to} contains the definition of code used to perform the transformation.

In order to define exactly what characters/tokens are legal in a specific definition, Langium uses terminals defined using Regular Expressions, these allow for a very specific character-set to be legal in specific keys of the AST generated by the parser generated by Langium. In the definition of \texttt{Proposal} and \texttt{Pair} the terminal \texttt{ID} is used, this terminal is limited to allow for only words and can only begin with a character of the alphabet or an underscore. In \texttt{Section} the terminal \texttt{TEXT} is used, this terminal is meant to allow any valid JavaScript code and the custom DSL language described in \ref{sec:DSL_DEF}. Both these terminals defined allows Langium to determine exactly what characters are legal in each location.

\begin{lstlisting}[caption={Definition of \DSL in Langium}, label={def:JSTQLLangium}]
grammar Jstql

entry Model:
    (proposals+=Proposal)*;

Proposal:
    'proposal' name=ID "{"
        (pair+=Pair)+
    "}";

Pair:
    "pair" name=ID "{"
        aplTo=ApplicableTo
        traTo=TraTo
    "}";

ApplicableTo:
    "applicable" "to" "{"
        apl_to_code=STRING
    "}";
TraTo:
    "transform" "to" "{"
        transform_to_code=STRING
    "}";
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
terminal STRING: /"[^"]*"|'[^']*'/;
\end{lstlisting}

In the case of \DSL, we are not actually implementing a programming language meant to be executed. We are using Langium in order to generate an AST that will be used as a markup language, similar to YAML, JSON or TOML. The main reason for using Langium in such an unconventional way is Langium provides support for Visual Studio Code integration, and it solves the issue of parsing the definition of each proposal manually. However with only the grammar we cannot actually verify the wildcards placed in \texttt{apl\_to\_code} and \texttt{transform\_to\_code} are correctly written. This is done by using a feature of Langium called \texttt{Validator}.


\subsection*{Langium Validator}

A Langium validator allows for further checks on the templates written withing \DSL, a validator allows for the implementation of specific checks on specific parts of the grammar.

\DSL does not allow empty typed wildcard definitions in \texttt{applicable to}, this means a wildcard cannot be untyped or allow any AST type to match against it. This is not possible to verify with the grammar, as inside the grammar the code is simply defined as a \texttt{STRING} terminal. This means further checks have to be implemented using code. In order to do this we have a specific \texttt{Validator} implemented on the \texttt{Pair} definition of the grammar. This means every time anything contained within a \texttt{Pair} is updated, the language server shipped with Langium will perform the validation step and report any errors.

The validator uses \texttt{Pair} as it's entry point, as it allows for a checking of wildcards in both \texttt{applicable to} and \texttt{transform to}, allowing for a check for if a wildcard identifier used in \texttt{transform to} exists in the definition of \texttt{applicable to}.

\begin{lstlisting}[language={JavaScript}]
export class JstqlValidator {
    validateWildcardAplTo(pair: Pair, accept: ValidationAcceptor): void {
        try {
            if (validationResultAplTo.errors.length != 0) {
                accept("error", validationResultAplTo.errors.join("\n"), {
                    node: pair.aplTo,
                    property: "apl_to_code",
                });
            }
            if (validationResultTraTo.length != 0) {
                accept("error", validationResultTraTo.join("\n"), {
                    node: pair.traTo,
                    property: "transform_to_code",
                });
            }
        } catch (e) {}
    }
}

\end{lstlisting}

\section{Pre-parsing}

In order to refer to internal DSL variables defined in \texttt{applicable to} in the transformation, we need to extract this information from the template definitions and pass that on to

\subsection*{Pre-parsing \DSL}

In order to allow the use of \cite[Babel]{Babel}, the wildcards present in the blocks of \texttt{applicable to} and \texttt{transform to} have to be parsed and replaced with some valid JavaScript. This is done by using a pre-parser that extracts the information from the wildcards and inserts an \texttt{Identifier} in their place.

In order to pre-parse the text, we look at each and every character in the code section, when a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used, so when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to just pass the character through to the variable \texttt{cleanedJS}. When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, we pass temp on to the function \texttt{parseInternalString}

\begin{lstlisting}[language={JavaScript}]
export function parseInternal(code: string): InternalParseResult {
    let cleanedJS = "";
    let temp = "";
    let flag = false;
    let prelude: InternalDSLVariable = {};

    for (let i = 0; i < code.length; i++) {
        if (code[i] === "<" && code[i + 1] === "<") {
            // From now in we are inside of the DSL custom block
            flag = true;
            i += 1;
            continue;
        }

        if (flag && code[i] === ">" && code[i + 1] === ">") {
            // We encountered a closing tag
            flag = false;

            let { identifier, types } = parseInternalString(temp);

            cleanedJS += identifier;

            prelude[identifier] = types;
            i += 1;
            temp = "";
            continue;
        }

        if (flag) {
            temp += code[i];
        } else {
            cleanedJS += code[i];
        }
    }
    return { prelude, cleanedJS };
}
\end{lstlisting}
Each wildcard will follow the exact same format, they begin with the opening token \texttt{<<}, followed by what name this variable will be referred by, this variable is called an internal DSL variable and will be used when transferring the matching AST node/s from the users code into the transform template. Following the internal DSL variable a \texttt{:} token is used to show we are moving onto the next part of the wildcard. Following this token is a list of DSL types, either 1 or many, that this wildcard can match against, separated by \texttt{|}.
This is a very strict notation on how wildcards can be written, this avoids collision with the already reserved bit-shift operator in in JavaScript, as it is highly unlikely any code using the bit-shift operator would fit into this format of a wildcard.

\begin{lstlisting}
<< Variable_Name : Type1 | Keyword | Type2 | Type3 >>
\end{lstlisting}

\begin{lstlisting}[language={JavaScript}]
function parseInternalString(dslString: string) {
    let [identifier, typeString] = dslString
            .replace(/\s/g, "").split(":");

    return {
        identifier,
        types: typeString.length > 0 ? typeString.split("|") : [""],
    };
}
\end{lstlisting}

\subsection*{Pre-parsing \DSLSH}

The self-hosted version \DSLSH also requires some form of pre-parsing in order to prepare the internal DSL environment. This is relatively minor and only parsing directly with no insertion compared to \DSL.

In order to use JavaScript as the meta language to define JavaScript we define a \texttt{Prelude}. This prelude is required to consist of several \texttt{Declaration Statements} where the variable names are used as the internal DSL variables and right side expressions are used as the DSL types. In order to allow for multiple types to be allowed for a single internal DSL variable we re-use JavaScripts list definition.

We use Babel to generate the AST of the \texttt{prelude} definition, this allows us to get a JavaScript object structure. Since the structure is very strictly defined, we can expect every \texttt{stmt} of \texttt{stmts} to be a variable declaration, otherwise throw an error for invalid prelude. Continuing through the object we have to determine if the prelude definition supports multiple types, that is if it is either an \texttt{ArrayDeclaration} or just an \texttt{Identifier}. If it is an array we initialize the prelude with the name field of the \texttt{VariableDeclaration} to either an empty array and fill it with each element of the ArrayDeclaration or directly insert the single Identifier.

\begin{lstlisting}[language={JavaScript}]
for (let stmt of stmts) {
    // Error if not variableDeclaration
    if (stmt.type == "VariableDeclaration") {
        // If defined multiple valid types
        if (stmt.init == "ArrayExpression") {
            prelude[stmt.name] = []; // Empty array on declared
            for (let elem of stmt.init.elements) {
                // Add each type of the array def
                prelude[stmt.name].push(elem);
            }
        } else {
            // Single valid type
            prelude[stmt.name] = [stmt.init.name];
        }
    }
}

\section{}


\end{lstlisting}