Finished feedback on chapter 4
This commit is contained in:
parent
5e2eb91434
commit
86037398ca
2 changed files with 42 additions and 48 deletions
BIN
build/report.pdf
BIN
build/report.pdf
Binary file not shown.
|
@ -14,9 +14,9 @@ In the architecture diagram of Figure~\ref{fig:architecture}, ellipse nodes show
|
|||
\begin{description}
|
||||
\item[\DSL Code] is the raw text definition of proposals
|
||||
\item[Self-Hosted Object] is the self-hosted version in \DSLSH format
|
||||
\item[1. Langium Parser] takes raw \DSL source code, and parses it into a DSL
|
||||
\item[1a. Langium Parser] takes raw \DSL source code, and parses it into a DSL
|
||||
\item[2. Wildcard parsing] extracts the wildcards from the raw template definition in \DSL, and parse
|
||||
\item[1. Prelude-builder] translates JavaScript prelude into array of wildcard strings
|
||||
\item[1b. Prelude-builder] translates JavaScript prelude into array of wildcard strings
|
||||
\item[3. Babel] parses the templates and the users source code into an AST
|
||||
\item[4. Custom Tree Builder] translates the Babel AST structure into our tree structure
|
||||
\item[5. Matcher] finds matches with \texttt{applicable to} template in user code
|
||||
|
@ -64,23 +64,23 @@ In the architecture diagram of Figure~\ref{fig:architecture}, ellipse nodes show
|
|||
|
||||
\section{Parsing \DSL using Langium}
|
||||
|
||||
In this section, the implementation of the parser for \DSL will be described. This section will outline the tool Langium, used as a parser-generator to create the AST used by the tool later to perform the transformations.
|
||||
In this section, we describe the implementation of the parser for \DSL. We outline the tool Langium, used as a parser-generator to create the AST used by the tool later to perform the transformations.
|
||||
|
||||
\subsection{Langium}
|
||||
|
||||
Langium~\cite{Langium} is a language workbench primarily used to create parsers and Integrated Development Environments for domain specific languages. These kinds of parsers produce Abstract Syntax Trees that is later used to create interpreters or other tooling. In this project, we use Langium to generate an AST definition in the form of TypeScript Objects. These objects and their structure are used as definitions for the tool to do matching and transformation of user code.
|
||||
Langium~\cite{Langium} is a language workbench~\cite{LanguageWorkbench} primarily used to create parsers and Integrated Development Environments for domain specific languages. These kinds of parsers produce Abstract Syntax Trees that are later used to create interpreters or other tooling. In this project, we use Langium to generate an AST definition in the form of TypeScript objects. These objects and their structure are used as definitions for the tool to do matching and transformation of user code.
|
||||
|
||||
In order to generate this parser, Langium requires a definition of a grammar. A grammar is a specification that describes a valid program. The \DSL grammar describes the structure of \DSL, such as \texttt{proposals}, \texttt{cases}, \texttt{applicable to}, and \texttt{transform to}. A grammar in Langium starts by describing the \texttt{Model}. The model is the top entry of the grammar, this is where the description of all valid top level statements.
|
||||
To generate this parser, Langium requires a definition of a grammar. A grammar is a specification that describes syntax a valid programs. The \DSL grammar describes the structure of \DSL, such as \texttt{proposals}, \texttt{cases}, \texttt{applicable to} blocks, and \texttt{transform to} blocks. A grammar in Langium starts by describing the \texttt{Model}. The model is the top entry of the grammar; this is where the description of all valid top level statements.
|
||||
|
||||
Contained within the \texttt{Model} rule, is one or more proposals. Each proposal is defined with the rule \texttt{Proposals}, and starts with the keyword \texttt{proposal}, followed by a name, and a code block. This rule is designed to contain every definition of a transformation related to a specific proposal. To hold every transformation definition, a proposal definition contains one or more cases.
|
||||
|
||||
The \texttt{Case} rule is created to contain a single transformation. Each case starts with the keyword \texttt{case}, followed by a name for the current case, then a block for that case's fields. Cases are designed in this way to separate different transformation definitions within a proposal. Each case contains a single definition used to match against user code, and a definition used to transform a match.
|
||||
The \texttt{Case} rule is created to contain a single transformation. Each case specification starts with the keyword \texttt{case}, followed by a name for the current case, then a block for that case's fields. Cases are designed in this way to separate different transformation definitions within a proposal. Each case contains a single definition used to match against user code, and a definition used to transform a match.
|
||||
|
||||
The rule, \texttt{AplicableTo}, is designed to hold a single template used for matching. It starts with the keywords \texttt{applicable} and \texttt{to}, followed by a block designed to hold the matching template definition. The template is defined as the terminal \texttt{STRING}, and is parsed as a raw string for characters by Langium~\cite{Langium}.
|
||||
The rule \texttt{AplicableTo}, is designed to hold a single template used for matching. It starts with the keywords \texttt{applicable} and \texttt{to}, followed by a block designed to hold the matching template definition. The template is defined as the terminal \texttt{STRING}, and is parsed as a raw string for characters by Langium~\cite{Langium}.
|
||||
|
||||
The rule, \texttt{TransformTo}, is created to contain a single template used for transforming a match. It starts with the keywords \texttt{transform} and \texttt{to}, followed by a block that holds the transformation definition. This transformation definition is declared with the terminal \texttt{STRING}, and is parser at a string of characters, same as the template in \texttt{applicable to}.
|
||||
The rule \texttt{TransformTo}, is created to contain a single template used for transforming a match. It starts with the keywords \texttt{transform} and \texttt{to}, followed by a block that holds the transformation definition. This transformation definition is declared with the terminal \texttt{STRING}, and is parser at a string of characters, same as the template in \texttt{applicable to}.
|
||||
|
||||
In order to define exactly what characters/tokens are legal in a specific definition, Langium uses terminals defined using Regular Expressions, these allow for a very specific character-set to be legal in specific keys of the AST generated by the parser generated by Langium. In the definition of \texttt{Proposal} and \texttt{Pair} the terminal \texttt{ID} is used, this terminal is limited to allow for only words and can only begin with a character of the alphabet or an underscore. In \texttt{Section} the terminal \texttt{STRING} is used, this terminal is meant to allow any valid JavaScript code and the custom DSL language described in \ref{sec:DSL_DEF}. Both these terminals defined allows Langium to determine exactly what characters are legal in each location.
|
||||
In order to define exactly what characters/tokens are legal in a specific definition, Langium uses terminals defined using regular expressions, these allow for a very specific character-set to be legal in specific keys of the AST generated by the parser generated by Langium. In the definition of \texttt{Proposal} and \texttt{Pair} the terminal \texttt{ID} is used; this terminal is limited to allow for only words and can only begin with a character of the alphabet or an underscore. In \texttt{Section} the terminal \texttt{STRING} is used, this terminal is meant to allow any valid JavaScript code and the custom DSL language described in \ref{sec:DSL_DEF}. Both these terminals defined allows Langium to determine exactly what characters are legal in each location.
|
||||
|
||||
\begin{lstlisting}[caption={Definition of \DSL in Langium.}, label={def:JSTQLLangium}]
|
||||
grammar Jstql
|
||||
|
@ -112,14 +112,14 @@ terminal ID: /[_a-zA-Z][\w_]*/;
|
|||
terminal STRING: /"[^"]*"|'[^']*'/;
|
||||
\end{lstlisting}
|
||||
|
||||
In the case of \DSL, we are not implementing a programming language meant to be executed. We are using Langium in order to generate an AST that will be used as a markup language, similar to YAML, JSON or TOML~\cite{TOML}. The main reason for using Langium in such an unconventional way is Langium provides support for Visual Studio Code integration, and it solves the issue of parsing the definition of each proposal manually. However with only the grammar we cannot actually verify the wildcards placed in \texttt{apl\_to\_code} and \texttt{transform\_to\_code} are correctly written. This is done by using a feature of Langium called \texttt{Validator}.
|
||||
With \DSL, we are not implementing a programming language meant to be executed. We are using Langium in order to generate an AST that will be used as a markup language, similar to YAML, JSON or TOML~\cite{TOML}. The main reason for using Langium in such an unconventional way is Langium provides support for Visual Studio Code integration, and it solves the issue of parsing the definition of each proposal manually. However, with this grammar we cannot actually verify the wildcards placed in \texttt{apl\_to\_code} and \texttt{transform\_to\_code} are correctly written. To do this, we have implemented several validation rules.
|
||||
|
||||
|
||||
\subsection*{Langium Validator}
|
||||
|
||||
A Langium validator allows for further checks DSL code, a validator allows for the implementation of specific checks on specific parts of the grammar.
|
||||
|
||||
\DSL does not allow empty typed wildcard definitions in \texttt{applicable to} blocks, this means a wildcard cannot be untyped or allow any AST type to match against it. This is not possible to verify within the grammar, as inside the grammar the code is simply defined as a \texttt{STRING} terminal. This means further checks have to be implemented using code. In order to do this we have a specific \texttt{Validator} implemented on the \texttt{Case} definition of the grammar. This means every time anything contained within a \texttt{Case} is updated, the language server created with Langium will perform the validation step and report any errors.
|
||||
\DSL does not allow empty typed wildcard definitions in \texttt{applicable to} blocks, this means we cannot define a wildcard that allows any AST type to match against it. This is not defined within the grammar, as inside the grammar the code is defined as a \texttt{STRING} terminal. This means further checks have to be implemented using code. In order to do this we have a specific \texttt{Validator} implemented on the \texttt{Case} definition of the grammar. This means every time anything contained within a \texttt{Case} is updated, the language server created with Langium will perform the validation step and report any errors.
|
||||
|
||||
The validator uses \texttt{Case} as its entry point, as it allows for a checking of wildcards in both \texttt{applicable to} and \texttt{transform to}, allowing for a check for whether a wildcard identifier used in \texttt{transform to} exists in the definition of \texttt{applicable to}.
|
||||
|
||||
|
@ -147,7 +147,7 @@ export class JstqlValidator {
|
|||
|
||||
\subsection*{Using Langium as a parser}
|
||||
|
||||
Langium~\cite{Langium} is designed to automatically generate extensive tool support for the language specified using its grammar. However, in our case we have to parse the \DSL definition using Langium, and then extract the Abstract syntax tree generated in order to use the information it contains.
|
||||
Langium is designed to automatically generate extensive tool support for the language specified using its grammar. However, in our case we have to parse the \DSL definition using Langium, and then extract the Abstract syntax tree generated in order to use the information it contains.
|
||||
|
||||
To use the parser generated by Langium, we created a custom function \texttt{parseDSLtoAST}, which takes a string as an input (the raw \DSL code), and outputs the pure AST using the format described in the grammar, see Listing \ref{sec:DSL_DEF}. This function is exposed as a custom API for our tool to interface with. This also means our tool is dependent on the implementation of the Langium parser to function with \DSL. The implementation of \DSLSH is entirely independent.
|
||||
|
||||
|
@ -159,15 +159,15 @@ In order to refer to internal DSL variables defined in \texttt{applicable to} an
|
|||
|
||||
\subsection*{Why not use Langium for wildcard parsing?}
|
||||
|
||||
Langium has support for creating a generator to output an artifact, which is some transformation applied to the AST built by the Langium parser. This suits the needs of \DSL quite well and could be used to extract the wildcards from each \texttt{pair} and create the \texttt{TransformRecipe}. This is the official way the developers of Langium want this kind of functionality to be implemented, however, the implementation would still be mostly the same, as the parsing of the wildcards still has to be done "manually" with code. Therefore, it was decided for this project to keep the parsing of the wildcards within the tool itself. If we were to use Langium generators to parse the wildcards, it would make \DSLSH not entirely independent, and the entire tool would rely on Langium. This is not preferred as that would mean both ways of defining a proposal are reliant of Langium. The reason for using our own extractor is to allow for an independent way to define transformations using our tool.
|
||||
Langium has support for creating a generator to output an artifact, which is some transformation applied to the AST built by the Langium parser. This suits the needs of \DSL quite well and could be used to extract the wildcards and parse the type expressions. This is the way the developers of Langium want this kind of functionality to be implemented, however, the implementation would still be mostly the same, as the parsing of the wildcards still has to be done "manually" with a custom parser. Therefore, we decided for this project to keep the parsing of the wildcards separate. If we were to use Langium generators to parse the wildcards, it would make \DSLSH dependent on Langium. This is not preferred as that would mean both ways of defining a proposal are reliant of Langium. The reason for using our own extractor is to allow for an independent way to define transformations using our tool.
|
||||
|
||||
\subsection*{Extracting wildcards from \DSL}
|
||||
|
||||
In order to allow the use of Babel~\cite{Babel}, the wildcards present in the \texttt{applicable to} blocks and \texttt{transform to} blocks have to be parsed and replaced with some valid JavaScript. This is done by using a pre-parser that extracts the information from the wildcards and inserts an \texttt{Identifier} in their place.
|
||||
|
||||
To extract the wildcards from the template, we look at each character in the template. If a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used (line 5,10 \ref{lst:extractWildcard}), when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to just pass the character through to the variable \texttt{cleanedJS} (line 196 \ref{lst:extractWildcard}). When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, when we have consumed the entirety of the wildcard, the contents of the \texttt{temp} variable is passed to a tokenizer, then the tokens are parsed by a recursive descent parser (line 10-21 \ref{lst:extractWildcard}).
|
||||
To extract the wildcards from the template, we look at each character in the template. If a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used (line 5,10 \ref{lst:extractWildcard}), when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to pass the character through to the variable \texttt{cleanedJS} (line 196 \ref{lst:extractWildcard}). When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, when we have consumed the entirety of the wildcard, the contents of the \texttt{temp} variable is passed to a tokenizer, then the tokens are parsed by a recursive descent parser (line 10-21 \ref{lst:extractWildcard}).
|
||||
|
||||
Once the wildcard is parsed, and we know it is safely a valid wildcard, we insert an identifier into the JavaScript template where the wildcard would reside. This allows for easier identifications of wildcards when performing matching/transformation as we can identify whether or not an Identifier in the code is the same as the identifier for a wildcard. This however, does introduce the problem of collisions between the wildcard identifiers inserted and identifiers present in the users code. In order to avoid this, the tool adds \texttt{\_\-\-\_} at the beginning of every identifier inserted in place of a wildcard. This allows for easier identification of if an Identifier is a wildcard, and avoids collisions where a variable in the user code has the same name as a wildcard inserted into the template. This can be seen on line 187 of the example below.
|
||||
Once the wildcard is parsed, and we know it is safely a valid wildcard, we insert an identifier into the JavaScript template where the wildcard would reside. This allows for easier identifications of wildcards when performing matching/transformation as we can identify whether or not an Identifier in the code is the same as the identifier for a wildcard. This however, does introduce the problem of collisions between the wildcard identifiers inserted and identifiers present in the users code. In order to avoid this, the tool adds \texttt{\_\-\-\_} at the beginning of every identifier inserted in place of a wildcard. This allows for easier identification of if an Identifier is a wildcard, and avoids collisions where a variable in the user code has the same name as a wildcard inserted into the template. This can be seen on line 17 of Listing~\ref{lst:extractWildcard}.
|
||||
|
||||
\begin{lstlisting}[language={JavaScript}, caption={Extracting wildcard from template.}, label={lst:extractWildcard}]
|
||||
export function parseInternal(code: string): InternalParseResult {
|
||||
|
@ -210,11 +210,11 @@ export function parseInternal(code: string): InternalParseResult {
|
|||
|
||||
\paragraph*{Parsing wildcard}
|
||||
|
||||
Once a wildcard has been extracted from definitions inside \DSL, they have to be parsed into a simple Tree to be used when matching against the wildcard. This is accomplished by using a simple tokenizer and a~\cite{RecursiveDescent}{recursive descent parser}.
|
||||
Once a wildcard has been extracted from definitions inside \DSL, they have to be parsed into a simple AST to be used when matching against the wildcard. This is accomplished by using a simple tokenizer and a recursive descent parser~\cite{RecursiveDescent}.
|
||||
|
||||
Our tokenizer takes the raw stream of input characters extracted from the wildcard block within the template, and determines which part is what token. Due to the very simple nature of the type expressions, no ambiguity is present with the tokens, so determining what token is meant to come at what time is quite trivial. We use a switch case on the current token, if the token is of length one we accept it and move on to the next character. If the next character is an unexpected one it will produce an error. The tokenizer also groups tokens with a \textit{token type}, this allows for an simpler parsing of the tokens later.
|
||||
|
||||
A recursive descent parser is created to closely mimic the grammar of the language the parser is implemented for, where we define functions for handling each of the non-terminals and ways to determine what non terminal each of the token-types result in. The type expression language is a very simple Boolean expression language, making parsing quite simple.
|
||||
A recursive descent parser mimics the grammar of the language the parser is implemented for, where we define functions for handling each of the non-terminals and ways to determine what non terminal each of the token-types result in. The type expression language is a very simple Boolean expression language, making parsing quite simple.
|
||||
|
||||
\begin{lstlisting}[caption={Grammar of type expressions}, label={ex:grammarTypeExpr}]
|
||||
Wildcard:
|
||||
|
@ -242,9 +242,9 @@ GroupExpr:
|
|||
"(" TypeExpr ")"
|
||||
\end{lstlisting}
|
||||
|
||||
The grammar of the type expressions used by the wildcards can be seen in \figFull[ex:grammarTypeExpr], the grammar is written in something similar to Extended Backus-Naur form, where we define the terminals and non-terminals in a way that makes the entire grammar \textit{solvable} by the Recursive Descent parser.
|
||||
The grammar of the type expressions used by the wildcards can be seen in \figFull[ex:grammarTypeExpr], the grammar is written in something similar to Extended Backus-Naur form, where we define the terminals and non-terminals in a way that makes the entire grammar parseable by the recursive descent parser.
|
||||
|
||||
Our recursive descent parser produces an~\cite{AST1,AST2}{AST} which is later used to determine when a wildcard can be matched against a specific AST node, the full definition of this AST can be seen in Appendix \ref{ex:typeExpressionTypes}. We use this AST by traversing it using a~\cite{VisitorPattern}{visitor pattern} and comparing each \texttt{Identifier} against the specific AST node we are currently checking, and evaluating all subsequent expressions and producing a boolean value, if this value is true, the node is matched against the wildcard, if not then we do not have a match.
|
||||
Our recursive descent parser produces an AST, which is later used to determine when a wildcard can be matched against a specific AST node, the full definition of this AST can be seen in Appendix \ref{ex:typeExpressionTypes}. We use this AST by traversing it using a~\cite{VisitorPattern}{visitor pattern} and comparing each \texttt{Identifier} against the specific AST node we are currently comparing with, and evaluating all subsequent expressions and producing a boolean value, if this value is true, the node is matched against the wildcard, if not then we do not have a match.
|
||||
|
||||
|
||||
|
||||
|
@ -261,14 +261,14 @@ The reason this is preferred is it allows us to avoid having to extract the wild
|
|||
\section{Using Babel to parse}
|
||||
\label{sec:BabelParse}
|
||||
|
||||
Allowing the tool to perform transformations of code requires the generation of an Abstract Syntax Tree from the users code, \texttt{applicable to} and \texttt{transform to}. This means parsing JavaScript into an AST, in order to do this we use a tool~\cite[Babel]{Babel}.
|
||||
Allowing the tool to perform transformations of code requires the generation of an Abstract Syntax Tree from the users code, \texttt{applicable to} and \texttt{transform to}. This means parsing JavaScript into an AST, in order to do this we use Babel~\cite{Babel}.
|
||||
|
||||
The most important reason for choosing to use Babel for the purpose of generating the AST's used for transformation is due to the JavaScript community surrounding Babel. As this tool is dealing with proposals before they are part of JavaScript, a parser that supports early proposals for JavaScript is required. Babel works closely with TC39 to support experimental syntax~\cite{BabelProposalSupport} through its plugin system, which allows the parsing of code not yet part of the language.
|
||||
|
||||
|
||||
\subsection*{Custom Tree Structure}
|
||||
|
||||
Performing matching and transformation on each of the sections inside a \texttt{case} definition, they have to be parsed into and AST in order to allow the tool to match and transform accordingly. To do this the tool uses the library~\cite[Babel]{Babel} to generate an AST data structure. However, this structure does not suit traversing multiple trees at the same time, this is a requirement for matching and transforming. Therefore we use this Babel AST and transform it into a simple custom tree structure to allow for simple traversal of the tree.
|
||||
Performing matching and transformation on each of the sections inside a \texttt{case} definition, they have to be parsed into and AST in order to allow the tool to match and transform accordingly, for this we use Babel~\cite{Babel}. However, Babels AST structure does not suit traversing multiple trees at the same time, this is a requirement for matching and transforming. Therefore we take the AST and transform it into a simple custom tree structure to allow for simple traversal of the tree.
|
||||
|
||||
As can be seen in \figFull[def:TreeStructure] we use a recursive definition of a \texttt{TreeNode} where a nodes parent either exists or is null (it is top of tree), and a node can have any number of children elements. This definition allows for simple traversal both up and down the tree. Which means traversing two trees at the same time can be done in the matcher and transformer section of the tool.
|
||||
|
||||
|
@ -287,9 +287,9 @@ export class TreeNode<T> {
|
|||
}
|
||||
\end{lstlisting}
|
||||
|
||||
Placing the AST generated by Babel into this structure means utilizing the library~\cite{BabelTraverse}{Babel Traverse}. Babel Traverse uses the~\cite{VisitorPattern}{visitor pattern} to allow for traversal of the AST. While this method does not suit traversing multiple trees at the same time, it allows for very simple traversal of the tree in order to place it into our simple tree structure.
|
||||
Placing the AST generated by Babel into this structure means utilizing the library~\cite{BabelTraverse}{Babel Traverse}. Babel Traverse uses the visitor pattern~\cite{VisitorPattern} to perform traversal of the AST. While this method does not suit traversing multiple trees at the same time, it allows for very simple traversal of the tree to place it into our simple tree structure.
|
||||
|
||||
\texttt{@babel/traverse}~\cite{BabelTraverse} uses the~\cite{VisitorPattern}{visitor pattern} to visit each node of the AST in a \textit{depth first} manner, the idea of this pattern is one implements a \textit{visitor} for each of the nodes in the AST and when a specific node is visited, that visitor is then used. In the case of transferring the AST into our simple tree structure we simply have to use the same visitor for all nodes, and place that node into the tree.
|
||||
To place the AST into our tree structure, we use \texttt{@babel/traverse}~\cite{BabelTraverse} to visit each node of the AST in a \textit{depth first} manner, the idea is we implement a \textit{visitor} for each of the nodes in the AST and when a specific node is encountered, the corresponding visitor of that node is used to visit it. When transferring the AST into our simple tree structure we simply have to use the same visitor for every kind of AST node, and place that node into the tree.
|
||||
|
||||
Visiting a node using the \texttt{enter()} function means we went from the parent to that child node, and it should be added as a child node of the parent. The node is automatically added to its parent list of children nodes from the constructor of \texttt{TreeNode}. Whenever leaving a node the function \texttt{exit()} is called, this means we are moving back up into the tree, and we have to update what node was the \textit{last} in order to generate the correct tree structure.
|
||||
|
||||
|
@ -318,8 +318,12 @@ traverse(ast, {
|
|||
|
||||
\end{lstlisting}
|
||||
|
||||
One important nuance of the way we place the nodes into the tree, is we still have the same underlying data structure from Babel. Because of this, the nodes can still be used with Babels APIs, and we can still access every field of each node. Transforming it into a tree only creates an easy way to traverse up and down the tree by references. We perform no copying.
|
||||
|
||||
\section{Outline of transforming user code}
|
||||
|
||||
Below is an outline of every major step performed, and how data is passed through the program.
|
||||
|
||||
\begin{algorithm}[H]
|
||||
\caption{Outline of steps of algorithm}\label{lst:outline}
|
||||
\begin{algorithmic}[1]
|
||||
|
@ -336,19 +340,17 @@ traverse(ast, {
|
|||
\State $M \gets singleMatcher(CT, AT, W)$
|
||||
\EndIf
|
||||
|
||||
\State $TransformedTemplates \gets $ []
|
||||
\State $TMap \gets $ Map()
|
||||
|
||||
\For{\textbf{each} m \textbf{in} M} \Comment{Build transformation templates}
|
||||
\State TransformedTemplates.insert $\gets$ buildTransform($m$, $TT$, $W$);
|
||||
\State TMap.insert $\gets$ buildTransform($m$, $TT$, $W$);
|
||||
\EndFor
|
||||
|
||||
\For{\textbf{each} $t$ \textbf{in} TransformedTemplates} \Comment{Insert transformed templates}
|
||||
\State traverse($C$)
|
||||
\If{$t.node == C.node$}
|
||||
\State $C$.replaceMany($t$);
|
||||
\For{$traverse(C)$}
|
||||
\If{$TMap.has(c)$}
|
||||
\State $C$.replaceMany($TMap.get(c)$);
|
||||
\EndIf
|
||||
\EndFor
|
||||
|
||||
\State \Return babel.generate($C$);
|
||||
|
||||
\end{algorithmic}
|
||||
|
@ -356,16 +358,8 @@ traverse(ast, {
|
|||
|
||||
|
||||
|
||||
Each line in \ref{lst:outline} is a step in the full algorithm for transforming user code based on a proposal specification in our tool. These steps work as follows:
|
||||
\begin{description}
|
||||
\item [Line 1:] Extract the wildcards from the template definitions and replace them with identifiers.
|
||||
\item [Line 3:] Parse all source code into a Babel AST using \texttt{@babel/parser}~\cite{BabelParser}
|
||||
\item [Line 5:] Convert the Babel AST into our own tree structure for simpler traversal of multiple trees at the same time
|
||||
\item [Lines 7-12:] Based on the \texttt{applicable to} template, decide what matching function to use, and find all matching sections of the user code.
|
||||
\item [Lines 14-17:] Move all matched wildcard nodes into an instance of the \texttt{transform to} template.
|
||||
\item [Lines 20-26:] Insert all transformations from the previous step into the original user AST.
|
||||
\item [Line 29:] Generate source code from the user AST using \texttt{@babel/generate}~\cite{BabelGenerate}.
|
||||
\end{description}
|
||||
Each part of Algorithm \ref{lst:outline} is a step in the full algorithm for transforming user code based on a proposal specification in our tool. The initial step (line 1) is extraction of wildcards from the template definition. This step also parses the wildcard type expressions into an AST. The second step (lines 2,3) is to parse all templates into an AST with \texttt{@babel/parser}~\cite{BabelParser}. Once we have parsed all code into ASTs, we decide which matching algorithm to use (line 5) based on the \texttt{applicable to} template. These algorithms will find all matching sections of the user AST to the template. We then build the transformation templates(lines 11-13), and insert the sections from the use code that was matched with a wildcard. These transformations are stored in a \texttt{Map}(line 10). Once all transformations are prepared, we traverse the user AST (line 14), and insert the transformations if the current node traversed is in the \texttt{Map} (line 16). The final step, is to generate JavaScript from the transformed AST (line 19).
|
||||
|
||||
|
||||
\section{Matching}
|
||||
|
||||
|
@ -373,16 +367,16 @@ This section discusses how we find matches in the users code, this is the step d
|
|||
|
||||
|
||||
|
||||
\paragraph*{Determining if AST nodes match.}
|
||||
\subsection{Determining if AST nodes match}
|
||||
|
||||
The initial problem we have to overcome is a way of comparing AST nodes from the template to AST nodes from the user code. This step also has to take into account comparing against wildcards and pass that information back to the AST matching algorithms.
|
||||
|
||||
When comparing two AST nodes in this tool, we use the function \texttt{checkCodeNode}, which will give the following values based on what kind of match these two nodes produce.
|
||||
\begin{description}
|
||||
\item[NoMatch:] The nodes do not match.
|
||||
\item[Matched:] The nodes are a match, and the node of \texttt{applicable to} is not a wildcard.
|
||||
\item[MatchedWithWildcard]: The node of the user AST produced a match against a wildcard.
|
||||
\item[MatchedWithPlussedWildcard]: The node of the user AST produced a match against a wildcard that can match one or more nodes against itself.
|
||||
\item[NoMatch] The nodes do not match.
|
||||
\item[Matched] The nodes are a match, and the node of \texttt{applicable to} is not a wildcard.
|
||||
\item[MatchedWithWildcard] The node of the user AST produced a match against a wildcard.
|
||||
\item[MatchedWithPlussedWildcard] The node of the user AST produced a match against a wildcard that can match one or more nodes against itself.
|
||||
\end{description}
|
||||
|
||||
When we are comparing two AST nodes, we have to perform an equality check. Due to this being a structural matching search, we can get away with just performing some preliminary checks, such as that names of identifiers, otherwise it is sufficient to just perform an equality check of the types of the nodes we are currently trying to match. If the types are the same, they can be validly matched against each other. This is sufficient because we are currently trying to determine if a single node can be a match, and not the entire template structure is a match. Therefore false positives that are not equivalent are highly unlikely due to the fact the entire structure has to be a false positive match.
|
||||
|
@ -398,12 +392,12 @@ if((aplToNode.type === "ExpressionStatement" &&
|
|||
}
|
||||
\end{lstlisting}
|
||||
|
||||
When comparing an AST node type against a wildcard type expression, we pass the node type into a function \texttt{WildcardEvaluator}. This evaluator will traverse through the AST of the wildcard type expression. Every leaf of the tree is equality checked against the type, and the resulting boolean value is returned. Then we solve the expression, bubbling the values through the visitor until we have traversed the entire expression, and have a result. If the result of the evaluator is \texttt{false}, we return \texttt{NoMatch}. If the result of the evaluation is \texttt{true}, we know we can match the user's AST node against the wildcard. If the wildcard type expression contains a Kleene plus, the comparison returns \texttt{MatchedWithPlussedWildcard}, if not, we return \texttt{MatchedWithWildcard}.
|
||||
When comparing an AST node type against a wildcard type expression, we pass the node type into a function \texttt{WildcardEvaluator}. This evaluator will traverse through the AST of the wildcard type expression. Every leaf of the tree is equality checked against the type, and the resulting boolean value is returned. Then we evaluate the expression, passing the values through the visitors until we have evaluated the entire expression, and have a result. If the result of the evaluator is \texttt{false}, we return \texttt{NoMatch}. If the result of the evaluation is \texttt{true}, we know we can match the user's AST node against the wildcard. If the wildcard type expression contains a Kleene plus, the comparison returns \texttt{MatchedWithPlussedWildcard}, if not, we return \texttt{MatchedWithWildcard}.
|
||||
|
||||
\subsection{Matching a single Expression/Statement template}
|
||||
\label{sec:singleMatcher}
|
||||
|
||||
The larger and more complex the \texttt{applicable to} template is, the fewer matches it will produce, therefore using a single expression/statement as the matching template is preferred. This is because there will be a higher probability of discovering applicable code with a template that is as generic and simple as possible. A very complex matching template with many statements might result in a lower chance of finding matches in the users code. Therefore using simple, single root node matching templates provide the highest possibility of discovering a match within the users code. This section will cover line 11 of Listing \ref{lst:outline}.
|
||||
In this section, we will discuss how matching is performed when the \texttt{applicable to} template is a single expression/statement. A very complex matching template with many statements might result in a lower chance of finding matches in the users code. Therefore using simple, single root node matching templates provide the highest possibility of discovering a match within the users code. This section will cover line 11 of Listing \ref{lst:outline}.
|
||||
|
||||
Determining if we are currently matching with a template that is only a single expression/statement, we have to verify that the program body of the template has the length of one, if it does we can use the single length traversal algorithm.
|
||||
|
||||
|
|
Loading…
Reference in a new issue