\chapter{Implementation} In this chapter, the implementation of the tool utilizing the \DSL and \DSLSH will be presented. It will describe the overall architecture of the tool, the flow of data throughout, and how the different stages of transforming user code are completed. \section{Architecture} The architecture of the work described in this thesis is illustrated in \figFull[fig:architecture] In this tool, there exists two multiple ways to define a proposal, and each provide the same functionality, they only differ in syntax and writing-method. One can either write the definition in \DSL which utilizes Langium to parse the language, or one can use a JSON definition, which is more friendly as an API or people more familiar with JSON definitions. \begin{figure} \begin{center} \begin{tikzpicture}[ roundnode/.style={ellipse, draw=red!60, fill=red!5, very thick, minimum size=7mm}, squarednode/.style={rectangle, draw=red!60, fill=red!5, very thick, minimum size=5mm} ] \node[roundnode] (jstqlcode) {JSTQL Code}; \node[roundnode] (selfhostedjsoninput) [right=of jstqlcode] {Self-Hosted JSON}; \node[squarednode] (langium) [below=of jstqlcode] {Langium Parser}; \node[squarednode] (jsonparser) [below=of selfhostedjsoninput] {Self-Hosted JSON parser}; \node[squarednode] (preludebuilder) [below=of jsonparser] {Prelude Builder}; \node[squarednode] (preParser) [below=of langium] {Pre-parser}; \node[squarednode] (babel) [below right=of preParser] {Babel}; \node[squarednode] (treebuilder) [below=of babel] {Custom AST builder}; \node[squarednode] (matcher) [below=of treebuilder] {Matcher}; \node[squarednode] (transformer) [below=of matcher] {Transformer}; \node[squarednode] (joiner) [below=of transformer] {Generator}; \draw[->] (jstqlcode.south) -- (langium.north); \draw[->] (langium.south) -- (preParser.north); \draw[->] (preParser.south) |- (babel.west); \draw[->] (babel.south) -- (treebuilder.north); \draw[->] (treebuilder.south) -- (matcher.north); \draw[->] (matcher.south) -- (transformer.north); \draw[->] (transformer.south) -- (joiner.north); \draw[->] (selfhostedjsoninput.south) -- (jsonparser.north); \draw[->] (jsonparser.south) -- (preludebuilder.north); \draw[->] (preludebuilder.south) |- (babel.east); \end{tikzpicture} \end{center} \caption[Tool architecture]{Overview of tool architecture} \label{fig:architecture} \end{figure} \section{Parsing \DSL using Langium} In this section, the implementation of the parser for \DSL will be described. This section will outline the tool Langium, used as a parser-generator to create the AST used by the tool later to perform the transformations. \subsection{Langium} Langium \cite{Langium} is primarily used to create parsers for Domain Specific Language, these kinds of parsers output an Abstract Syntax Tree that is later used to create interpreters or other tooling. In the case of \DSL we use Langium to generate TypeScript Objects that are later used as definitions for the tool to do matching and transformation of user code. In order to generate this parser, Langium required a definition of a Grammar. A grammar is a set of instructions that describe a valid program. In our case this is a definition of describing a proposal, and its applicable to, transform to, descriptions. A grammar in Langium starts by describing the \texttt{Model}. The model is the top entry of the grammar, this is where the description of all valid top level statements. In \DSL the only valid top level statement is the definition of a proposal. This means our language grammar model contains only one list, which is a list of 0 or many \texttt{Proposal} definitions. A Proposal definition is denoted by a block, which is denoted by \texttt{\{...\}} containing some valid definition. In the case of \DSL this block contains 1 or many definitions of \texttt{Pair}. \texttt{Pair} is defined very similarly to \texttt{Proposal}, as it contains only a block containing a definition of a \texttt{Section} The \texttt{Section} is where a single case of some applicable code and its corresponding transformation is defined. This definition contains specific keywords do describe each of them, \texttt{applicable to} denotes a definition of some template \DSL uses to perform the matching algorithm. \texttt{transform to} contains the definition of code used to perform the transformation. In order to define exactly what characters/tokens are legal in a specific definition, Langium uses terminals defined using Regular Expressions, these allow for a very specific character-set to be legal in specific keys of the AST generated by the parser generated by Langium. In the definition of \texttt{Proposal} and \texttt{Pair} the terminal \texttt{ID} is used, this terminal is limited to allow for only words and can only begin with a character of the alphabet or an underscore. In \texttt{Section} the terminal \texttt{TEXT} is used, this terminal is meant to allow any valid JavaScript code and the custom DSL language described in \ref{sec:DSL_DEF}. Both these terminals defined allows Langium to determine exactly what characters are legal in each location. \begin{lstlisting}[caption={Definition of \DSL in Langium}, label={def:JSTQLLangium}] grammar Jstql entry Model: (proposals+=Proposal)*; Proposal: 'proposal' name=ID "{" (pair+=Pair)+ "}"; Pair: "pair" name=ID "{" aplTo=ApplicableTo traTo=TraTo "}"; ApplicableTo: "applicable" "to" "{" apl_to_code=STRING "}"; TraTo: "transform" "to" "{" transform_to_code=STRING "}"; hidden terminal WS: /\s+/; terminal ID: /[_a-zA-Z][\w_]*/; terminal STRING: /"[^"]*"|'[^']*'/; \end{lstlisting} In the case of \DSL, we are not actually implementing a programming language meant to be executed. We are using Langium in order to generate an AST that will be used as a markup language, similar to YAML, JSON or TOML. The main reason for using Langium in such an unconventional way is Langium provides support for Visual Studio Code integration, and it solves the issue of parsing the definition of each proposal manually. However with only the grammar we cannot actually verify the wildcards placed in \texttt{apl\_to\_code} and \texttt{transform\_to\_code} are correctly written. This is done by using a feature of Langium called \texttt{Validator}. \subsection*{Langium Validator} A Langium validator allows for further checks on the templates written withing \DSL, a validator allows for the implementation of specific checks on specific parts of the grammar. \DSL does not allow empty typed wildcard definitions in \texttt{applicable to}, this means a wildcard cannot be untyped or allow any AST type to match against it. This is not possible to verify with the grammar, as inside the grammar the code is simply defined as a \texttt{STRING} terminal. This means further checks have to be implemented using code. In order to do this we have a specific \texttt{Validator} implemented on the \texttt{Pair} definition of the grammar. This means every time anything contained within a \texttt{Pair} is updated, the language server shipped with Langium will perform the validation step and report any errors. The validator uses \texttt{Pair} as it's entry point, as it allows for a checking of wildcards in both \texttt{applicable to} and \texttt{transform to}, allowing for a check for if a wildcard identifier used in \texttt{transform to} exists in the definition of \texttt{applicable to}. \begin{lstlisting}[language={JavaScript}] export class JstqlValidator { validateWildcardAplTo(pair: Pair, accept: ValidationAcceptor): void { try { if (validationResultAplTo.errors.length != 0) { accept("error", validationResultAplTo.errors.join("\n"), { node: pair.aplTo, property: "apl_to_code", }); } if (validationResultTraTo.length != 0) { accept("error", validationResultTraTo.join("\n"), { node: pair.traTo, property: "transform_to_code", }); } } catch (e) {} } } \end{lstlisting} \section{Pre-parsing} In order to refer to internal DSL variables defined in \texttt{applicable to} in the transformation, we need to extract this information from the template definitions and pass that on to \subsection*{Pre-parsing \DSL} In order to allow the use of \cite[Babel]{Babel}, the wildcards present in the blocks of \texttt{applicable to} and \texttt{transform to} have to be parsed and replaced with some valid JavaScript. This is done by using a pre-parser that extracts the information from the wildcards and inserts an \texttt{Identifier} in their place. In order to pre-parse the text, we look at each and every character in the code section, when a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used, so when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to just pass the character through to the variable \texttt{cleanedJS}. When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, we pass temp on to the function \texttt{parseInternalString} \begin{lstlisting}[language={JavaScript}] export function parseInternal(code: string): InternalParseResult { let cleanedJS = ""; let temp = ""; let flag = false; let prelude: InternalDSLVariable = {}; for (let i = 0; i < code.length; i++) { if (code[i] === "<" && code[i + 1] === "<") { // From now in we are inside of the DSL custom block flag = true; i += 1; continue; } if (flag && code[i] === ">" && code[i + 1] === ">") { // We encountered a closing tag flag = false; let { identifier, types } = parseInternalString(temp); cleanedJS += identifier; prelude[identifier] = types; i += 1; temp = ""; continue; } if (flag) { temp += code[i]; } else { cleanedJS += code[i]; } } return { prelude, cleanedJS }; } \end{lstlisting} Each wildcard will follow the exact same format, they begin with the opening token \texttt{<<}, followed by what name this variable will be referred by, this variable is called an internal DSL variable and will be used when transferring the matching AST node/s from the users code into the transform template. Following the internal DSL variable a \texttt{:} token is used to show we are moving onto the next part of the wildcard. Following this token is a list of DSL types, either 1 or many, that this wildcard can match against, separated by \texttt{|}. This is a very strict notation on how wildcards can be written, this avoids collision with the already reserved bit-shift operator in in JavaScript, as it is highly unlikely any code using the bit-shift operator would fit into this format of a wildcard. \begin{lstlisting} << Variable_Name : Type1 | Keyword | Type2 | Type3 >> \end{lstlisting} \begin{lstlisting}[language={JavaScript}] function parseInternalString(dslString: string) { let [identifier, typeString] = dslString .replace(/\s/g, "").split(":"); return { identifier, types: typeString.length > 0 ? typeString.split("|") : [""], }; } \end{lstlisting} \subsection*{Pre-parsing \DSLSH} The self-hosted version \DSLSH also requires some form of pre-parsing in order to prepare the internal DSL environment. This is relatively minor and only parsing directly with no insertion compared to \DSL. In order to use JavaScript as the meta language to define JavaScript we define a \texttt{Prelude}. This prelude is required to consist of several \texttt{Declaration Statements} where the variable names are used as the internal DSL variables and right side expressions are used as the DSL types. In order to allow for multiple types to be allowed for a single internal DSL variable we re-use JavaScripts list definition. We use Babel to generate the AST of the \texttt{prelude} definition, this allows us to get a JavaScript object structure. Since the structure is very strictly defined, we can expect every \texttt{stmt} of \texttt{stmts} to be a variable declaration, otherwise throw an error for invalid prelude. Continuing through the object we have to determine if the prelude definition supports multiple types, that is if it is either an \texttt{ArrayDeclaration} or just an \texttt{Identifier}. If it is an array we initialize the prelude with the name field of the \texttt{VariableDeclaration} to either an empty array and fill it with each element of the ArrayDeclaration or directly insert the single Identifier. \begin{lstlisting}[language={JavaScript}] for (let stmt of stmts) { // Error if not variableDeclaration if (stmt.type == "VariableDeclaration") { // If defined multiple valid types if (stmt.init == "ArrayExpression") { prelude[stmt.name] = []; // Empty array on declared for (let elem of stmt.init.elements) { // Add each type of the array def prelude[stmt.name].push(elem); } } else { // Single valid type prelude[stmt.name] = [stmt.init.name]; } } } \end{lstlisting} \section{Using Babel to parse} Allowing the tool to perform transformations of code requires the generation of an Abstract Syntax Tree from the users code, \texttt{applicable to} and \texttt{transform to}. This means parsing JavaScript into an AST, in order to do this we use a tool \cite[Babel]{Babel}. The most important reason for choosing to use Babel for the purpose of generating the AST's used for transformation is due to the JavaScript community surrounding Babel. As this tool is dealing with proposals before they are part of JavaScript, a parser that supports early proposals for JavaScript is required. Babel supports most Stage 2 proposals through its plugin system, which allows the parsing of code not yet part of the language. \subsection*{Custom Tree Structure} To allow for matching and transformations to be applied to each of the sections inside a \texttt{pair} definition, they have to be parsed into and AST in order to allow the tool to match and transform accordingly. To do this the tool uses the library \cite[Babel]{Babel} to generate an AST data structure. However, this structure does not suit traversing multiple trees at the same time, this is a requirement for matching and transforming. Therefore we use this Babel AST and transform it into a simple custom tree structure to allow for simple traversal of the tree. As can be seen in \figFull[def:TreeStructure] we use a recursive definition of a \texttt{TreeNode} where a nodes parent either exists or is null (it is top of tree), and a node can have any number of children elements. This definition allows for simple traversal both up and down the tree. Which means traversing two trees at the same time can be done in the matcher and transformer section of the tool. \begin{lstlisting}[language={JavaScript}, label={def:TreeStructure}, caption={Simple definition of a Tree structure in TypeScript}] export class TreeNode { public parent: TreeNode | null; public element: T; public children: TreeNode[] = []; constructor(parent: TreeNode | null, element: T) { this.parent = parent; this.element = element; if (this.parent) this.parent.children.push(this); } } \end{lstlisting} Placing the AST generated by Babel into this structure means utilizing the library \cite{BabelTraverse}{Babel Traverse}. Babel Traverse uses the \cite{VisitorPattern}{visitor pattern} to allow for traversal of the AST. While this method does not suit traversing multiple trees at the same time, it allows for very simple traversal of the tree in order to place it into our simple tree structure. \cite{BabelTraverse}{Babel Traverse} uses the \cite{VisitorPattern}{visitor pattern} to visit each node of the AST in a \textit{depth first} manner, the idea of this pattern is one implements a \textit{visitor} for each of the nodes in the AST and when a specific node is visited, that visitor is then used. In the case of transferring the AST into our simple tree structure we simply have to use the same visitor for all nodes, and place that node into the tree. Visiting a node using the \texttt{enter()} function means we went from the parent to that child node, and it should be added as a child node of the parent. The node is automatically added to its parent list of children nodes from the constructor of \texttt{TreeNode}. Whenever leaving a node the function \texttt{exit()} is called, this means we are moving back up into the tree, and we have to update what node was the \textit{last} in order to generate the correct tree structure. \begin{lstlisting}[language={JavaScript}] traverse(ast, { enter(path: any) { let node: TreeNode = new TreeNode( last, path.node as t.Node ); if (last == null) { first = node; } last = node; }, exit(path: any) { if (last && last?.element?.type != "Program") { last = last.parent; } }, }); if (first != null) { return first; } \end{lstlisting} \section{Matching} Performing the match against the users code it the most important step, as if no matching code is found the tool will do no transformations. Finding the matches will depend entirely on how well the definition of the proposal is written, and how well the proposal actually can be defined within the confines of \DSL. In this chapter we will discuss how matching individual AST nodes to each other, and how wildcard matching is performed. \subsection*{Matching singular Expression} The method of writing the \texttt{applicable to} section using a singular expression is by far the most versatile way of defining a proposal, this is simply because there will be a much higher chance of discovering matches with a template that is as generic as possible. Therefore only matching against a single expression ensures the matcher tries to perform a match at every level of the AST. \subsection*{Matching Statements} Using multiple statements in the template of \texttt{applicable to} will result in a much stricter matcher, that will only try to perform an exact match using a sliding window of the amount of statements at every \textit{BlockStatement}, as that is the only placement Statements can reside in JavaScript. \section{Transforming} \section{Generating} To generate JavaScript from the transformed AST created by this tool, we use a JavaScript library titled \cite{BabelGenerate}{babel/generator}. This library is specifically designed for use with Babel to generate JavaScript from a Babel AST.