Finished cleaning up chapter 3 and 4

This commit is contained in:
Rolf Martin Glomsrud 2024-05-20 20:23:40 +02:00
parent 454b175aba
commit 0d0c00acd9
9 changed files with 420 additions and 292 deletions

12
.vscode/notes/FutureWork.txt vendored Normal file
View file

@ -0,0 +1,12 @@
Future work:
Major:
- support for feedback
- support for other languages
Medium:
- parameterized specifications ("generics")
- fully self-hosted
- support for other JS parsers + support for new introduced syntax

Binary file not shown.

View file

@ -5,29 +5,25 @@ syntactic proposals for EcmaScript.
\section{The core idea}
\textbf{THIS IS TOO ABRUPT OF AN INTRODUCTION, MORE GENERAL ALMOST REPEAT OF BACKGGROUND GOES HERE}
\textbf{CURRENT VERSION vs FUTURE VERSION instead of old way}
\textbf{DO NOT DISCUSS TOOL HERE}
When a use of EcmaScript wants to suggest a change to the language, the idea of the change has to be described in a Proposal. A proposal is a general way of describing a change and its requirements, this is done by a language specification, motivation for the idea, and general discussion around the proposed change. A proposal ideally also needs backing from the community of users that use EcmaScript, this means the proposal has to be presented to users some way. This is currently done by many channels, such as polyfills, code examples, and as beta features of the main JavaScript engines, however, this paper wishes to showcase proposals to users by using a different avenue.
Users of EcmaScript have a familiarity with code they themselves have written. This means they have knowledge of how their own code works and why they might have written it a certain way. This project aims to utilize this pre-existing knowledge to showcase new proposals for EcmaScript. Showcasing proposals this way will allow users to focus on what the proposal actually entails, instead of focusing on the examples written by the proposal author.
Further in this chapter, we will be discussing the \textit{old} and \textit{new} way of programming in EcmaScript. What we are referring to in this case is with set of problems a proposal is trying to solve, if that proposal is allowed into EcmaScript as part of the language, there will be a \textit{new} way of solving said problems. The \textit{old} way is the current status quo when the proposal is not part of EcmaScript, and the \textit{new} way is when the proposal is part of EcmaScript and we are utilizing the new features of said proposal.
Further in this chapter, we will be discussing the current version and future version of EcmaScript. What we are referring to in this case is with set of problems a proposal is trying to solve, if that proposal is allowed into EcmaScript as part of the language, there will be a future way of solving said problems. The current way is the current status quo when the proposal is not part of EcmaScript, and the future version is when the proposal is part of EcmaScript and we are utilizing the new features of said proposal.
The program will allow the users to preview proposals way before they are part of the language. This way the committee will get feedback from users of the language earlier in the proposal process, this will ideally allow for a more efficient process of adding proposals to EcmaScript.
The program will allow the users to preview proposals way before they are part of the language. This way the committee can get useful feedback from users of the language earlier in the proposal process. Using the users familiarity will ideally allow for a more efficient process developing EcmaScript.
\subsection{Applying a proposal}
The way this project will use the pre-existing knowledge a user has of their own code is to use that code as base for showcasing a proposals features. Using the users own code as base requires the following steps in order to automatically implement the examples that showcase the proposal inside the context of the users own code.
The tool has to identify where the features and additions of a proposal could have been used. This means identifying parts of the users program that use pre-existing EcmaScript features that the proposal is interacting with and trying to solve. This will then identify all the different places in the users program the proposal can be applied. This step is called \textit{matching} in the following chapters
The ide is to identify where the features and additions of a proposal could have been used. This means identifying parts of the users program that use pre-existing EcmaScript features that the proposal is interacting with and trying to solve. This will then identify all the different places in the users program the proposal can be applied. This step is called \textit{matching} in the following chapters
Once the tool has matched all parts of the program that the proposal could be applied, the users code has to be transformed to use the feature/s the proposal is trying to implement. This step also includes keeping the context and functionality of the users program the same, so variables and other context related concepts have to be transferred over to the transformed code.
Once we have matched all the parts of the program the proposal could be applied to, the users code has to be transformed to use the proposal, this means changing the code to use a possible future version of JavaScript. This step also includes keeping the context and functionality of the users program the same, so variables and other context related concepts have to be transferred over to the transformed code.
The output of the previous step is then a set of code pairs, where one a part of the users original code, and the second is the transformed code. The transformed code is then ideally a perfect replacement for the original user code if the proposal is part of EcmaScript. These pairs are used as examples to present to the user, presented together so the user can see their original code together with the transformed code. This allows for a direct comparison and an easier time for the user to understand the proposal.
The steps outlined in this section require some way of defining matching and transforming of code. This has to be done very precisely and accurately in order to avoid bugs. Imprecise definition of the proposal might lead to transformed code not being a direct replacement for the code it was based upon. For this we suggest two different methods, a definition written in a custom DSL \DSL and a definition written in a self-hosted way only using EcmaScript as a language as definition language. Read more about this in SECTION HERE.
The steps outlined in this section require some way of defining matching and transforming of code. This has to be done very precisely and accurately in order to avoid examples that are wrong. Imprecise definition of the proposal might lead to transformed code not being a direct replacement for the code it was based upon. For this we suggest two different methods, a definition written in a custom DSL \DSL and a definition written in a self-hosted way only using EcmaScript as a language as definition language. Read more about this in SECTION HERE.
\section{Applicable proposals}
\label{sec:proposals}
@ -58,7 +54,7 @@ let c = 200;
See that in \ref{ex:proposal} the change is optional, and is not applied to the declaration of \textit{c}, but it is applied to the declaration of \textit{x}. Since the change is optional to use, and essentially is just \textit{syntax sugar}, this proposal does not make any changes to functionality or semantics, and can therefore be categorized as a syntactic proposal.
\subsection{\cite[Discard Bindings]{Proposal:DiscardBindings}}
\subsection{\cite{Proposal:DiscardBindings}{Discard Bindings}}
The proposal \discardBindings is classified as a Syntactic Proposal, as it contains no change to the semantics of EcmaScript. This proposal is created to allow for discarding objects when using the feature of unpacking objects/arrays on the left side of an assignment. The whole idea of this proposal is to avoid declaring unused temporary variables.
@ -189,23 +185,16 @@ await using void = x; // via: LexicalBinding : `void` Initializer
\subsection{Pipeline Proposal}
The pipeline proposal is a Syntactic proposal with no change to functionality of EcmaScript, it focuses solely on solving problems related to nesting of function calls and other expressions that allow for a topic reference.
The pipeline proposal is a Syntactic proposal with no change to functionality of EcmaScript, it focuses solely on solving problems related to nesting of function calls and other expressions that allow for a topic reference. A topic reference is a reference to some value based on the current context/topic.
The pipeline proposal aims to solve two problems with performing consecutive operations on a value. In EcmaScript there are two main styles of achieving this functionality currently. Nesting calls and chaining calls, these two come with a differing set of challenges when used.
Nesting calls is mainly an issue related to function calls with one or more arguments. When doing many calls in sequence the result will be a \textit{deeply nested} call expression. See in \ref{ex:deeplyNestedCall}.
Nesting calls is mainly an issue related to function calls with one or more arguments. When doing many calls in sequence the result will be a \textit{deeply nested} call expression.
Challenges with nested calls
\begin{itemize}
\item The order of calls go from right to left, which is opposite of the natural reading direction users of EcmaScript are used to
\item When introduction functions with multiple arguments in the middle of the nested call, it is not intuitive to see what call it belongs to.
\end{itemize}
Benefits of nested calls
\begin{itemize}
\item Does not require special design thought to be used
\end{itemize}
Nested calls has some specific challenges when relating to readability when used. The order of calls go from right to left, which is opposite of the natural reading direction a lot of the users of EcmaScript are used to day to day, this means it is difficult switch reading direction when working out which call happens in which order. When using functions with multiple arguments in the middle of the nested call, it is not intuitive to see what call its arguments belong to, this is also a problem with readability of nested calls, which is the main challenge this proposal is trying to solve. Nested calls are not all bad however, one of the main good points of nested calls is they can be simplified by using temporary variables, while this does introduce its own set of issues, it provides some way of mitigating the readability problem. Another positive side of nested calls is they do not require a specific design to be used, a library developer does not have to design their library around this specific call style.
\begin{lstlisting}[language={JavaScript}, caption={Example of deeply nested call}, label={ex:deeplyNestedCall}]
\begin{lstlisting}[language={JavaScript}]
// Deeply nested call with single arguments
function1(function2(function3(function4(value))));
@ -213,27 +202,11 @@ Benefits of nested calls
function1(function2(function3(value2, function4)), value1);
\end{lstlisting}
Nesting solves some of the issues relating to nesting, as it allows for a more natural reading direction left to right when identifying the sequence of call. However, solving consecutive operations using chaining has its own set of challenges when used
\subsection{Description of Pipeline proposal}
Chaining solves some of the issues relating to nesting, as it allows for a more natural reading direction left to right when identifying the sequence of call, arguments are naturally grouped together with their respoective function call, and it provides a way of untangling deep nesting. However, solving consecutive operations using chaining has its own set of challenges when used. In order to use chaining, the api of the code you are trying to call has to be designed to allow for chaining. This is not always the case, making using chaining when it is not been designed specifically for very difficult. There are also concepts in JavaScript not supported when using chaining, such as arithmetic operations, literals, await, yield and so on. This is actually the biggest downside of chaining, as it only allows for function calls when used, and if one wants to allow for use of other concepts temporary variables have to be used.
Challenges with chaining calls
\begin{itemize}
\item APIs has to be specifically designed with chaining in mind
\item Might not even be possible due to external libraries
\item Does not support other concepts such as arithmetic operations, array/object literals, await, yield, etc...
\end{itemize}
Benefits of chaining calls
\begin{itemize}
\item More natural direction of call order
\item Arguments of functions are grouped with function name
\item Untangles deep nesting
\end{itemize}
\begin{lstlisting}[language={JavaScript}, caption={Example of chaining calls}, label={ex:chainingCall}]
\begin{lstlisting}[language={JavaScript}]
// Chaining calls
function1().function2().function3();
@ -243,105 +216,151 @@ Benefits of chaining calls
The pipeline proposal aims to combine the benefits of these two styles without all the challenges each method faces.
The main benefit of pipeline is to allow for a similar style to chaining when chaining has not been specifically designed to be applicable. The idea uses syntactic sugar to change the order of writing the calls without influencing the API of the functions.
The main benefit of pipeline is to allow for a similar style to chaining when chaining has not been specifically designed to be applicable. The idea uses syntactic sugar to change the order of writing the calls without influencing the API of the functions. Doing this allows each call to come in the direction of left to right, while still maintaining the modularity of deeply nested function calls.
\begin{lstlisting}[language={JavaScript}, caption={Example from jquery}, label= {ex:pipeline}]
The way the pipeline proposal aims to solve this is to introduce a pipe operator, which takes the result of an expression on the left, and \textit{pipes} it into an expression on the right. The location of where the result is piped to is where the \textit{topic token} is located. All the specifics of the exact token used as a \textit{topic token} and exactly what operator will be used as the pipe operator might be subject to change.
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Status quo
var minLoc = Object.keys( grunt.config( "uglify.all.files" ) )[ 0 ];
// With pipes
var minLoc = grunt.config('uglify.all.files') |> Object.keys(%)[0];
var loc = Object.keys(grunt.config( "uglify.all" ))[0];
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// With pipes
var loc = grunt.config('uglify.all') |> Object.keys(%)[0];
\end{lstlisting}
\end{minipage}\hfil
\begin{lstlisting}[language={JavaScript}, caption={Example from unpublish}, label= {ex:pipeline}]
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Status quo
const json = await npmFetch.json(npa(pkgs[0]).escapedName, opts);
const json = await npmFetch.json(
npa(pkgs[0]).escapedName, opts);
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// With pipes
const json = pkgs[0] |> npa(%).escapedName |> await npmFetch.json(%, opts);
\end{lstlisting}
\end{minipage}\hfil
\begin{lstlisting}[language={JavaScript}, caption={Example from underscore.js}, label= {ex:pipeline}]
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Status quo
return filter(obj, negate(cb(predicate)), context);
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// With pipes
return cb(predicate) |> _.negate(%) |> _.filter(obj, %, context);
\end{lstlisting}
\end{minipage}\hfil
\begin{lstlisting}[language={JavaScript}, caption={Example from ramda.js}, label= {ex:pipeline}]
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Status quo
return xf['@@transducer/result'](obj[methodName](bind(xf['@@transducer/step'], xf), acc));
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// With pipes
return xf
|> bind(%['@@transducer/step'], %)
|> obj[methodName](%, acc)
|> xf['@@transducer/result'](%);
\end{lstlisting}
\end{minipage}\hfil
\subsection{Do proposal}
The \cite[Do Proposal]{Proposal:DoProposal} is a proposal meant to bring \textit{expression oriented} programming to EcmaScript. Expression oriented programming is a concept taken from functional programming which allows for combining expressions in a very free manor allowing for a highly malleable programming experience.
The motivation of the do expression proposal is to create a feature that allows for local scoping of a code block that is treated as an expression. This allows for complex code requiring multiple statements to be confined inside its own scope and the resulting value is returned from the block as an expression. Similar to how a unnamed function is used currently. The current status quo of how to achieve this behavior is to use unnamed functions and invoke them immediately, or use an arrow function, these two are equivalent to a do expression.
The motivation of the do expression proposal is to create a feature that allows for local scoping of a code block that is treated as an expression. This allows for complex code requiring multiple statements to be confined inside its own scope and the resulting value is returned from the block as an expression. Similar to how a unnamed functions or arrow functions are currently used. The current status quo of how to achieve this behavior is to use unnamed functions and invoke them immediately, or use an arrow function, these two are equivalent to a do expression.
The codeblock of a do expression has one major difference from these equivalent functions, as it allows for implicit return of the final statement in the block. This only works if the statement does not contain a final line end (;).
The codeblock of a do expression has one major difference from these equivalent functions, as it allows for implicit return of the final expression of the block, and is the resulting value of the entire do expression.
The local scoping of this feature allows for a cleaner environment in the parent scope of the do expression. What is meant by this is for temporary variables and other assignments used once can be enclosed inside a limited scope within the do block. Allowing for a cleaner environment inside the parent scope where the do block is defined.
\begin{lstlisting}[language={JavaScript}, caption={Example of do expression}, label: {ex:doExpression}]
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Current status quo
let x = () => {
let tmp = f();
return tmp + tmp + 1;
};
// Using a immediately invoked function
let x = function(){
let tmp = f();
return tmp + tmp + 1;
}();
// Using do expression
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// With do expression
let x = do {
let tmp = f();
tmp + tmp + 1
}
tmp + tmp + 1;
};
\end{lstlisting}
\end{minipage}\hfil
This proposal has some limitations on its usage. Due to the implicit return of the final statement you cannot end a do expression with an \texttt{if} without and \texttt{else}, or a \texttt{loop}.
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Current status quo
let x = function(){
let tmp = f();
let a = g() + tmp;
return a - 1;
}();
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// With do expression
let x = do {
let tmp = f();
let a = g() + tmp;
a - 1;
};
\end{lstlisting}
\end{minipage}\hfil
This proposal has some limitations on its usage. Due to the implicit return of the final expression you cannot end a do expression with an \texttt{if} without and \texttt{else}, or a \texttt{loop}.
\subsection{Await to Promise}
This section covers an imaginary proposal that was used to evaluate the program developed in this thesis. This imaginary proposal is less of a proposal and more of just a pure JavaScript transformation example. What this proposal wants to achieve is re-writing from using \texttt{await} so use promises.
This section covers an imaginary proposal that was used to evaluate the program developed in this thesis. This imaginary proposal is less of a proposal and more of just a pure JavaScript transformation example. What this proposal wants to achieve is transforming a function using \texttt{await}, into a function that uses and returns a promise.
In order to do this an equivalent way of writing code containing \texttt{await} in the syntax of \texttt{promises} had to be identified. In this case, the equivalent way of expressing this is consuming the rest of the scope \texttt{await} was written in and place it inside a \texttt{then(() => {})} function.
In order to do this an equivalent way of writing code containing \texttt{await} in the syntax of \texttt{.then()} promise, an equivalent way of writing the same functionality has to be identified. In this case, the equivalent way of expressing this using a promise is consuming the rest of the scope after \texttt{await} was written, and place it inside a \texttt{then(() => {})} function. The variable the await was assigned to has to be used as the argument to the \texttt{.then()} function.
\begin{lstlisting}[language={JavaScript}, caption={Example of await to promises}, label={ex:awaitToPromise}]
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Code containing await
async function a(){
let b = 9000;
let something = await asyncFunction();
let c = something + 100;
return c + 1;
}
\end{lstlisting}
\end{minipage}\hfil
\noindent\begin{minipage}{.45\textwidth}
\begin{lstlisting}[language={JavaScript}]
// Re-written using promises
function a(){
return asyncFunction().then((something) => {
async function a(){
let b = 9000;
return asyncFunction()
.then((something) => {
let c = something + 100;
return c;
})
}
In the example \ref*{ex:awaitToPromise} we change \texttt{a} from async to synchronous, but we still return a promise which ensures everything using the function \texttt{a} to still get the expected value.
\end{lstlisting}
\end{minipage}\hfil
\section{Searching user code for applicable snippets}
@ -350,42 +369,34 @@ In order to identify snippets of code in the users codebase where a proposal is
\subsection{\DSL}
\label{sec:DSL_DEF}
In order to allow for the utilization of the users code. We have to identify snippets of the users code that some proposal is applicable to. In order to do this, we have designed a DSL called \DSL JavaScript Template Query Language. This DSL will contain the entire definition used to identify and transform user code in order to showcase a proposal.
Showcasing a proposal using a users code requires some way of identifying applicable code sections to that proposal. To do this, we have designed a DSL called \DSL , JavaScript Template Query Language. This DSL will contain the entire definition used to identify and transform user code in order to showcase a proposal.
\subsection{Matching}
\subsection*{Identifying applicable code}
In order to identify snippets of code a proposal is applicable to, we use templates of JavaScript. These templates allow for \textit{wildcard} sections where it can match against specific AST nodes. These \textit{wildcard} sections are also used to transfer the context of the code matched into the transformation.
In order to identify sections of code a proposal is applicable to, we use templates of JavaScript. These templates are used to identify and match applicable sections of a users code. A matching section for a template is one that produces an exactly equal AST structure, where each node of the AST sections has the same information contained within it. This means templates are matched exactly against the users code, this does not really provide some way of actually querying the code and performing context based transformations, so for that we use \textit{Wildcards} within the template.
A template containing none of these \textit{wildcards} is matched exactly. This essentially means the match will be a direct code search for snippets where the AST of the users code match the template exactly.
\textit{Wildcards} are written into the template inside a block denoted by $<$$<$ $>$$>$. Each wildcard has to start with an identifier, which is a way of referring to that wildcard in the definition of the transformation template later. This allows for transferring the context of parts matched to a wildcard into the transformed output, like identifiers, parts of statements or even entire statements all together can be transferred from the original user code into the transformation template.The second part of the wildcard contains a wildcard type expression. A wildcard type expression is a way of defining exactly what types of AST nodes a wildcard will produce a match against, these type expressions use boolean logic together with the AST node-types from \cite{Babel}{BabelJS} to create a very strict way of defining wildcards.
The \textit{wildcards} are written inside a block denoted by << WILDCARD >>. Each wildcard has to have a DSL identifier, a way of referring to that wildcard in the definition of the transformation, and a wildcard type
\subsubsection*{Wildcard type expressions}
Each wildcard has to have some form of type. These types can be node-types inherited from Babels AST definition. This means if you want a wildcard to match any \textit{CallExpression} then that wildcard should be of type CallExpression. In order to allow for multiple node-types to match against a single wildcard, \DSL allows for sum types for wildcards, allowing multiple AST node-types to be allowed to a single wildcard definition.
Wildcard type expressions allow for writing complex boolean logic on the kinds of nodes a wildcard can be matched against, this means writing the type for a wildcard can be as simple as just \texttt{Expression || Statement}, or as complex as \texttt{((Statement \&\& ! ReturnStatement) \&\& !VariableDeclaration)*}. The operators mean the following, \texttt{\&\&} is logical AND, this means both parts of the expression have to evaluate to true, \texttt{||} means logical OR, so either side of expression can be true for the entire expression to be true, \texttt{!} is the only unary expression, and is logical NOT, so \texttt{!Statement} is any node that is NOT a Statement. There is also this definition \texttt{+}, this is only valid at the top level of the expression, and the pluss operator means this wildcard can be used any number of times in order. This is useful for matching against a series of Statements, while not wanting to match an entire BlockStatement. The final part of a wildcard expression is the AST types used, an example of this would be the wildcard type expression \texttt{ReturnStatement}, which will only match against an AST node of type ReturnStatement. Using the power of the wildcards, we can create templates that can be used for querying a users code for specific code sections that a proposal is applicable to.
The wildcard type can also be a custom type with special functionality. Some examples of this is \texttt{anyRest}, which allows for the matcher to match it against multiple expressions/statements defined within an AST node as a list. As an example this type could match against any number of statements within a codeblock.
This type definition is also used to define specific behavior the program using this DSL should perform. One example of this can be found in \ref{def:pipeline}, where the DSL function \textit{anyRest} is used to allow for any amount of child nodes found together with the wildcard. This means it is feasible to match against any number of function parameters for example.
\begin{lstlisting}[caption={Example of a wildcard}, label={ex:wildcard}]
let variableName = << expr1: CallExpression | Identifier >>;
\begin{lstlisting}
let variableName = << expr1: ((CallExpression || Identifier) && !ReturnStatement)+ >>;
\end{lstlisting}
A wildcard section is defined on the right hand side of an assignment statement. This wildcard will match against any AST node classified as a CallExpression or an Identifier.
In \ref{ex:wildcard} a wildcard section is defined on the right hand side of an assignment statement. This wildcard will match against any AST node classified as a CallExpression or an Identifier.
\subsection{\DSL custom matching types}
\texttt{anyNExprs} is a custom DSL matching type. This type allows the matcher to match a specific section of the JavaScript template against any number of elements stored within a list on the AST node Object it is currently trying to match. Using this allows for transferring any number of expression from the match into the transformed code. This custom type is used in \ref{def:pipeline}.
\texttt{anyNStatements} is a custom DSL matching type. This type allows the matcher to match against any number of Statements within a section of JavaScript. This custom type is used in \ref{def:doExpression}
\subsection{Transforming}
Observe that once the a matching template has been defined, a definition of transformation has to be created. This transformation has to transfer over the code matched to a wildcard. This means a way to refer to the wildcard is needed. We do this in a very similar manner as defining the wildcard, since we have an internal DSL identifier previously defined in the definition of the matching, all that is needed is to refer to that identifier. This is done with a similar block definition << >> containing the identifier.
When matching sections of the users code has been found, we need some way of defining how to transform those sections to showcase a proposal. This is done by a similar template to applicable to, namely \textit{transform to}, this template describes the general structure of the newly transformed code.
\begin{lstlisting}[caption={
See \ref{ex:wildcard} contains identifier expr1, and we refer to the same in this example, the only transformation happening here is rewriting let to const.
}]
A transformation template is used to define how the matches will be transformed after applicable code has been found. The transformation is a general template of the code once the match is replaced in the original AST. However, without transferring over the context from the match, it is just a template search and replace. So in order to transfer the context from the match, wildcards are defined in this template as well. These wildcards use the same block notation found in the applicable to template, however they do not need to contain the types, as those are not needed in the transformation. The only section required in the wildcard is the identifier used in applicable to, this is done in order to know which wildcard match we are taking the context from, and where to place it in the transformation template.
\begin{lstlisting}[language={JavaScript}]
// Example of transform to template
const variableName = <<expr1>>;
\end{lstlisting}
@ -395,7 +406,7 @@ const variableName = <<expr1>>;
\label{sec:DSLStructure}
\DSL is designed to mimic the examples already provided by a proposal champion in the proposals README. These examples can be seen in each of the proposals described in \ref{sec:proposals}.
\DSL is designed to mimic the examples already provided by a proposal champion in the proposals README. These examples can be seen in each of the proposals described in \ref{sec:proposals}. The idea is to allow a similar kind of notation to the examples in order to define the transformations.
\subsubsection*{Define proposal}
@ -409,10 +420,10 @@ proposal Pipeline_Proposal{
\subsubsection*{Defining a pair of template and transformation}
Each proposal will have 1 or more definitions of a template for code to identify in the users codebase, and its corresponding transformation definition. These are grouped together in order to have a simple way of identifying the corresponding pairs. This section of the proposal is defined by the keyword \textit{pair} and a block to contain its related fields. A proposal will contain 1 or more of this section. This allows for matching many different code snippets and showcasing more of the proposal than a single concept the proposal has to offer.
Each proposal will have 1 or more definitions of a template for code to identify in the users codebase, and its corresponding transformation definition. These are grouped together in order to have a simple way of identifying the corresponding cases of matching and transformations. This section of the proposal is defined by the keyword \textit{case} and a block to contain its related fields. A proposal will contain 1 or more of this section. This allows for matching many different code snippets and showcasing more of the proposal than a single concept the proposal has to offer.
\begin{lstlisting}[caption={Example of pair section}]
pair PAIR_NAME {
case case_name {
}
\end{lstlisting}
@ -429,7 +440,7 @@ applicable to {
\subsubsection*{Defining the transformation}
In order to define the transformation that is applied to a specific matched code snippet, the keyword \textit{transform to} is used. This section is similar to the template section, however it uses the specific DSL keywords to transfer the context of the matched user code, this allows us to keep parts of the users code important to the original context it was written in.
In order to define the transformation that is applied to a specific matched code snippet, the keyword \textit{transform to} is used. This section is similar to the template section, however it uses the specific DSL identifiers defined in applicable to, in order to transfer the context of the matched user code, this allows us to keep parts of the users code important to the original context it was written in.
\begin{lstlisting}[caption={Example of transform to section}]
transform to{
@ -443,7 +454,7 @@ Taking all these parts of \DSL structure, defining a proposal in \DSL will look
\begin{lstlisting}[caption={\DSL definition of a proposal}]
proposal PROPOSAL_NAME {
pair PAIR_NAME {
case PAIR_NAME {
applicable to {
}
@ -462,8 +473,7 @@ proposal PROPOSAL_NAME {
\section{Using the \DSL with an actual syntactic proposal}
In this section some examples of how a \DSL definition of each of the proposals discussed in \ref{sec:proposals} might look. These definitions do not have to cover every single case where the proposal might be applicable, as they just have to be general enough to create some amount of examples on any reasonably long code definition a user might use this tool with.
This section contains the definitions of the proposals used to evaluate the tool created in this thesis. These definitions do not have to cover every single case where the proposal might be applicable, as they just have to be general enough to create some amount of examples that will give a representative number of matches when the transformations are applied to some relatively long user code.
\subsection{Pipeline Proposal}
@ -472,32 +482,29 @@ The Pipeline Proposal is the easiest to define of the proposals presented in \re
\begin{lstlisting}[language={JavaScript}, caption={Example of Pipeline Proposal definition in \DSL}, label={def:pipeline}]
proposal Pipeline{
pair SingleArgument {
case SingleArgument {
applicable to {
<<someFunctionIdent>>(<<someFunctionParam: Expression | Identifier>>);
"<<someFunctionIdent:Identifier || MemberExpression>>(<<someFunctionParam: Expression>>);"
}
transform to {
<<someFunctionParam>> |> <<someFunctionIdent>>(%);
"<<someFunctionParam>> |> <<someFunctionIdent>>(%);"
}
}
case MultiArgument {
case DualArgument{
applicable to {
<<someFunctionIdent>>(
<<firstFunctionParam : Expression | Identifier>>,
<<restOfFunctionParams: anyRest>>
);
"<<someFunctionIdent: Identifier || MemberExpression>>(<<someFunctionParam: Expression>>, <<moreFunctionParam: Expression>>)"
}
transform to {
<<firstFunctionParam>> |> <<someFunctionIdent>>(%, <<restOfFunctionParams>>);
"<<someFunctionParam>> |> <<someFunctionIdent>>(%, <<moreFunctionParam>>)"
}
}
}
\end{lstlisting}
This first pair definition \texttt{SingleArgument} of the Pipeline proposal will apply to any \textit{CallExpression} with a single argument. And it will be applied to each of the deeply nested callExpressions. The second pair definition \texttt{MultiArgument} will apply to any \textit{CallExpression} with 2 or more arguments. This is because we use the custom \DSL type \texttt{anyRest} that allows to match against any number of elements in an array stored on an AST node.
This first pair definition \texttt{SingleArgument} of the Pipeline proposal will apply to any \textit{CallExpression} with a single argument. And it will be applied to each of the deeply nested callExpressions in a nested call. The second pair definition \texttt{DualArgument} will apply to any \textit{CallExpression} with 2 arguments. One can in theory define any number of cases to cover a higher amount of arguments in the function call, however we only need to cover enough cases to produce at least some matches, so a smaller definition is better in this case
\subsection{Do Proposal}
@ -505,36 +512,63 @@ The \cite[Do Proposal]{Proposal:DoProposal} can also be defined with this tool.
\begin{lstlisting}[language={JavaScript}, caption={Definition of Do Proposal in \DSL}, label={def:doExpression}]
proposal DoExpression{
pair arrowFunction{
case arrowFunction{
applicable to {
() => {
<<blockStatements: anyStatementList>>
return << returnExpr: Expr >>
"let <<ident:Identifier>> = () => {
<<statements: (Statement && !ReturnStatement)*>>
return <<returnVal : Expression>>;
}
"
}
transform to {
do {
<< blockStatements >>
<< returnExpr >>
}
"let <<ident>> = do {
<<statements>>
<<returnVal>>
}"
}
}
pair immediatelyInvokedUnnamedFunction {
case immediatelyInvokedUnnamedFunction {
applicable to {
function(){
<<blockStatements: anyNStatements>>
return << returnExpr: Expr >>
}();
"let <<ident:Identifier>> = function(){
<<statements: (Statement && !ReturnStatement)*>>
return <<returnVal : Expression>>;
}();"
}
transform to {
do {
<< blockStatements >>
<< returnExpr >>
"let <<ident>> = do {
<<statements>>
<<returnVal>>
}"
}
}
}
\end{lstlisting}
\subsection{Await to Promises evaluation proposal}
This section will cover the evaluation proposal we created in order to evaluate this tool described in \ref{sec:proposals}.
This proposal was created in order to evaluate the tool, as it is quite difficult to define applicable code in this current template form. This definition is limited, and can only apply if the function only contains a single await expression. This actually highlights some of the issues with the current design of \DSL that will be described in Future Work.
\begin{lstlisting}[language={JavaScript}, caption={Definition of Await to Promise evaluation proposal in \DSL}, label={def:awaitToPromise}]
proposal awaitToPomise{
case single{
applicable to {
"let <<ident:Identifier>> = await <<awaitedExpr: Expression>>;
<<statements: (Statement && !ReturnStatement)*>>
return <<returnExpr: Expression>>
"
}
transform to{
"return <<awaitedExpr>>.then((<<ident>>) => {
<<statements>>
return <<returnExpr>>
});"
}
}
}
}
\end{lstlisting}

View file

@ -53,13 +53,13 @@ In this section, the implementation of the parser for \DSL will be described. Th
\subsection{Langium}
Langium \cite{Langium} is primarily used to create parsers for Domain Specific Language, these kinds of parsers output an Abstract Syntax Tree that is later used to create interpreters or other tooling. In the case of \DSL we use Langium to generate TypeScript Objects that are later used as definitions for the tool to do matching and transformation of user code.
Langium \cite{Langium} is primarily used to create parsers for Domain Specific Language, these kinds of parsers output an Abstract Syntax Tree that is later used to create interpreters or other tooling. In the case of \DSL we use Langium to generate an AST definition in the form of TypeScript Objects, these objects and their relation are used as definitions for the tool to do matching and transformation of user code.
In order to generate this parser, Langium required a definition of a Grammar. A grammar is a set of instructions that describe a valid program. In our case this is a definition of describing a proposal, and its applicable to, transform to, descriptions. A grammar in Langium starts by describing the \texttt{Model}. The model is the top entry of the grammar, this is where the description of all valid top level statements.
In \DSL the only valid top level statement is the definition of a proposal. This means our language grammar model contains only one list, which is a list of 0 or many \texttt{Proposal} definitions. A Proposal definition is denoted by a block, which is denoted by \texttt{\{...\}} containing some valid definition. In the case of \DSL this block contains 1 or many definitions of \texttt{Pair}.
In \DSL the only valid top level statement is the definition of a proposal. This means our language grammar model contains only one list, which is a list of 0 or many \texttt{Proposal} definitions. A Proposal definition is denoted by a block, which is denoted by \texttt{\{...\}} containing some valid definition. In the case of \DSL this block contains 1 or many definitions of \texttt{Case}.
\texttt{Pair} is defined very similarly to \texttt{Proposal}, as it contains only a block containing a definition of a \texttt{Section}
\texttt{Case} is defined very similarly to \texttt{Proposal}, as it contains only a block containing a definition of a \texttt{Section}
The \texttt{Section} is where a single case of some applicable code and its corresponding transformation is defined. This definition contains specific keywords do describe each of them, \texttt{applicable to} denotes a definition of some template \DSL uses to perform the matching algorithm. \texttt{transform to} contains the definition of code used to perform the transformation.
@ -73,11 +73,11 @@ entry Model:
Proposal:
'proposal' name=ID "{"
(pair+=Pair)+
(case+=Case)+
"}";
Pair:
"pair" name=ID "{"
Case:
"case" name=ID "{"
aplTo=ApplicableTo
traTo=TraTo
"}";
@ -128,15 +128,27 @@ export class JstqlValidator {
\end{lstlisting}
\subsection*{Using Langium as a parser}
\cite{Langium}{Langium} is designed to automatically generate a lot of tooling for the language specified using its grammar. However, in our case we have to parse the \DSL definition using Langium, and then extract the Abstract syntax tree generated in order to use the information it contains.
To use the parser generated by Langium, we created a custom function \texttt{parseDSLtoAST()} within our Langium project, this function takes a string as an input, the raw \DSL code, and outputs the pure AST using the format described in the grammar described in \figFull[def:JSTQLLangium]. This function is exposed as a custom API for our tool to interface with. This also means our tool is dependent on the implementation of the Langium parser to function with \DSL. The implementation of \DSLSH is entirely independent.
When interfacing with the Langium parser to get the Langium generated AST, the exposed API function is imported into the tool, when this API is ran, the output is on the form of the Langium \textit{Model}, which follows the same form as the grammar. This is then transformed into an internal object structure used by the tool, this structure is called \texttt{TransformRecipe}, and is then passed in to perform the actual transformation.
\section{Pre-parsing}
In order to refer to internal DSL variables defined in \texttt{applicable to} in the transformation, we need to extract this information from the template definitions and pass that on to
In order to refer to internal DSL variables defined in \texttt{applicable to} in the transformation, we need to extract this information from the template definitions and pass that on to the matcher.
\subsection*{Pre-parsing \DSL}
\subsection*{Why not use Langium?}
Langium has support for creating a generator for generating an artifact, this actually suits the needs of \DSL quite well and could be used to extract the wildcards from each \texttt{pair} and create the \texttt{TransformRecipe}. This would, as a consequence, make \DSLSH not be entirely independent, and the entire tool would rely on Langium. This is not preferred as that would mean both ways of defining a proposal both are reliant of Langium and not separated. The reason for using our own pre-parser is to allow for an independent way to define transformations using our tool.
\subsection*{Extracting wildcards from \DSL}
In order to allow the use of \cite[Babel]{Babel}, the wildcards present in the blocks of \texttt{applicable to} and \texttt{transform to} have to be parsed and replaced with some valid JavaScript. This is done by using a pre-parser that extracts the information from the wildcards and inserts an \texttt{Identifier} in their place.
In order to pre-parse the text, we look at each and every character in the code section, when a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used, so when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to just pass the character through to the variable \texttt{cleanedJS}. When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, we pass temp on to the function \texttt{parseInternalString}
To pre-parse the text, we look at each and every character in the code section, when a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used, so when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to just pass the character through to the variable \texttt{cleanedJS}. When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, when we have consumed the entirety of the wildcard, it is then passed to a tokenizer, then to a recursive descent parser.
\begin{lstlisting}[language={JavaScript}]
export function parseInternal(code: string): InternalParseResult {
@ -176,24 +188,46 @@ export function parseInternal(code: string): InternalParseResult {
return { prelude, cleanedJS };
}
\end{lstlisting}
Each wildcard will follow the exact same format, they begin with the opening token \texttt{<<}, followed by what name this variable will be referred by, this variable is called an internal DSL variable and will be used when transferring the matching AST node/s from the users code into the transform template. Following the internal DSL variable a \texttt{:} token is used to show we are moving onto the next part of the wildcard. Following this token is a list of DSL types, either 1 or many, that this wildcard can match against, separated by \texttt{|}.
This is a very strict notation on how wildcards can be written, this avoids collision with the already reserved bit-shift operator in in JavaScript, as it is highly unlikely any code using the bit-shift operator would fit into this format of a wildcard.
\begin{lstlisting}
<< Variable_Name : Type1 | Keyword | Type2 | Type3 >>
\subsection*{Parsing wildcard}
Once a wildcard has been extracted from the \texttt{pair} definitions inside \DSL, they have to be parsed into a simple Tree to be used when matching against the wildcard. This is accomplished by using a simple tokenizer and a \cite{RecursiveDescent}{Recursive Descent Parser}.
Our tokenizer simply takes the raw stream of input characters extracted from the wildcard block within the template, and determines which part is what token. Due to the very simple nature of the type expressions, no ambiguity is present with the tokens, so determining what token is meant to come at what time is quite trivial. The tokenizer \textbf{I need to figure out what kind of tokenization algorithm i am actually using LOL}
A recursive descent parser is created to closely mimic the grammar of the language the parser is implemented for, where we define functions for handling each of the non-terminals and ways to determine what non terminal each of the tokens result in. In the case of this parser, the language is a very simple boolean expression language. We use boolean combinatorics to determine whether or not a specific AST nodetype of a \cite{BabelParser}{Babel parser} AST node is a match against a specific wildcard. This means we have to create a very simple AST that can be evaluated using the AST nodetype as an input.
\begin{lstlisting}[caption={Grammar of type expressions}, label={ex:grammarTypeExpr}]
Wildcard:
Identifier ":" MultipleMatch
MultipleMatch:
GroupExpr "*"
| TypeExpr
TypeExpr:
BinaryExpr
| UnaryExpr
| PrimitiveExpr
BinaryExpr:
TypeExpr { Operator TypeExpr }*
UnaryExpr:
{UnaryOperator}? TypeExpr
PrimitiveExpr:
GroupExpr | Identifier
GroupExpr:
"(" TypeExpr ")"
\end{lstlisting}
\begin{lstlisting}[language={JavaScript}]
function parseInternalString(dslString: string) {
let [identifier, typeString] = dslString
.replace(/\s/g, "").split(":");
The grammar of the type expressions used by the wildcards can be seen in \figFull[ex:grammarTypeExpr], the grammar is written in something similar to Extended Backus-Naur form, where we define the terminals and non-terminals in a way that makes the entire grammar \textit{solvable} by the Recursive Descent parser.
Our recursive descent parser produces a very simple \cite{AST1,AST2}{AST} which is later used to determine when a wildcard can be matched against a specific AST node, the full definiton of this AST can be seen in \ref*{ex:typeExpressionTypes}. We use this AST by traversing it using a \cite{VisitorPattern}{visitor pattern} and comparing each \textit{Identifier} against the specific AST node we are currently checking, and evaluating all subsequent expressions and producing a boolean value, if this value is true, the node is matched against the wildcard, if not then we do not have a match.
return {
identifier,
types: typeString.length > 0 ? typeString.split("|") : [""],
};
}
\end{lstlisting}
\subsection*{Pre-parsing \DSLSH}
@ -203,25 +237,6 @@ In order to use JavaScript as the meta language to define JavaScript we define a
We use Babel to generate the AST of the \texttt{prelude} definition, this allows us to get a JavaScript object structure. Since the structure is very strictly defined, we can expect every \texttt{stmt} of \texttt{stmts} to be a variable declaration, otherwise throw an error for invalid prelude. Continuing through the object we have to determine if the prelude definition supports multiple types, that is if it is either an \texttt{ArrayDeclaration} or just an \texttt{Identifier}. If it is an array we initialize the prelude with the name field of the \texttt{VariableDeclaration} to either an empty array and fill it with each element of the ArrayDeclaration or directly insert the single Identifier.
\begin{lstlisting}[language={JavaScript}]
for (let stmt of stmts) {
// Error if not variableDeclaration
if (stmt.type == "VariableDeclaration") {
// If defined multiple valid types
if (stmt.init == "ArrayExpression") {
prelude[stmt.name] = []; // Empty array on declared
for (let elem of stmt.init.elements) {
// Add each type of the array def
prelude[stmt.name].push(elem);
}
} else {
// Single valid type
prelude[stmt.name] = [stmt.init.name];
}
}
}
\end{lstlisting}
\section{Using Babel to parse}
\label{sec:BabelParse}
@ -286,88 +301,59 @@ traverse(ast, {
\section{Matching}
Performing the match against the users code it the most important step, as if no matching code is found the tool will do no transformations. Finding the matches will depend entirely on how well the definition of the proposal is written, and how well the proposal actually can be defined within the confines of \DSL. In this chapter we will discuss how matching individual AST nodes to each other, and how wildcard matching is performed.
Performing the match against the users code it the most important step, as if no matching code is found the tool will do no transformations. Finding the matches will depend entirely on how well the definition of the proposal is written, and how well the proposal actually can be defined within the confines of \DSL. In this chapter we will discuss how matching is performed based on the definition of \texttt{applicable to}
\subsection*{Matching singular Expression/Statement}
\subsection*{Determining if AST nodes match}
The method of writing the \texttt{applicable to} section using a singular simple expression/statement is by far the most versatile way of defining a proposal, this is simply because there will be a much higher chance of discovering matches with a template that is as generic and simple as possible. Therefore only matching against a single expression/statement ensures the matcher tries to perform a match at every level of the AST. This of course relies on that the expression/statement used to match against is as simple as possible, in order to easily find matches in user code.
The initial problem we have to overcome is a way of comparing AST nodes from the template to AST nodes from the user code. This step also has to take into account comparing against wildcards and pass that information back to the AST matching algorithms.
When the template of \texttt{applicable to} is a single expression, parsing it with \cite{Babel}{Babel} will produce an AST containing a single node of the \textit{Program} body, namely an \textit{ExpressionStatement}. The reason for this is Babel will treat an expression not bound to a Statement as an ExpressionStatement, this is a problem because we cannot use an ExpressionStatement to match against any expression that are already part of some statement, for example a VariableDeclaration. In order to solve this we have to remove the AST node ExpressionStatement and only do the match against its Sub-Expression. This of course is not a requirement for matching against a single statement, as then it is expected that the user code will be \textit{similar}.
In order to determine if we are matching against an expression or a statement we can verify the type of the first statement in the program body of the AST generated when using \cite{BabelGenerate}{babel generate} \texttt{applicable to}. If the first statement is of type \texttt{ExpressionStatement}, we know the matcher is supposed to match against an expression, and we have to remove the \texttt{ExpressionStatement} AST node from the tree used for matching. This is done by simply using \texttt{applicableTo.children[0].children[0].element} as the AST to match the users code against.
In the pre-parsing step of \DSL we are replacing each of the wildcards with an expression of type Identifier, this means we are inserting an Identifier at either a location where an expression resides, or a statement. In the case of the identifier being placed where a statement should reside, it will be wrapped in an ExpressionStatement. This has to be taken into account when comparing statement nodes from the template and user code, as if we encounter an ExpressionStatement, its corresponding expression has to be checked for if it is an Identifier.
\begin{lstlisting}[language={JavaScript}]
if (applicableTo.children[0].element.type === "ExpressionStatement") {
let matcher = new Matcher(
internals,
applicableTo.children[0].children[0].element
);
matcher.singleExprMatcher(
code,
applicableTo.children[0].children[0]
);
return matcher.matches;
Since a wildcard is replaced by an Identifier, when matching a node in the template, we have to check if it is the \textit{Identifier} or \textit{ExpressionStatement} with an identifier contained within, if there is an identifier, we have to check if that identifier is a registered wildcard. If an Identifier shares a name with a wildcard, we have to compare the node against the Type expression of that wildcard. When we do this, we traverse the entirety of the wildcard expression AST and compare each of the leaves against the type of the current code node. These resulting values are then passed through the type expression and the resulting value is whether or not that code node can be matched against the wildcard. We differentiate between if a node matched against a wildcard with the \texttt{+} notation, as if that is the case we have to keep using that wildcard until it returns false in the tree exploration algorithms.
When we are either matching against an Identifier that is not a registered wildcard, or any other AST node in the template, we have to perform an equality check, in the case of this template language, we can get away with just performing some preliminary checks, such as that names of Identifiers are the same. Otherwise it is sufficient to just perform an equality check of the types of the nodes we are currently trying to match. If the types are the same, they can be validly matched against each other. This is sufficient because we are currently trying to determine if a single node can be a match, and not the entire template structure is a match. Therefore false positives that are not equivalent are highly unlikely due to the fact the entire structure has to be a false positive match.
The function used for matching singular nodes will give different return values based on how they were matched. The results NoMatch and Matched are self explanatory, they are used when either no match is found, or if the nodes types match and the template node is not a wildcard. When we are matching against a wildcard, if it is a simple wildcard that cannot match against multiple nodes of the code, the result will be \texttt{MatchedWithWildcard}. If the wildcard used to match is a one or many wildcard, the result will be \texttt{MatchedWithPlussedWildcard}, as this shows the recursive traversal algorithm used that this node of the template have to be tried against the code nodes sibling.
\begin{lstlisting}
enum MatchResult {
MatchedWithWildcard,
MatchedWithPlussedWildcard,
Matched,
NoMatch,
}
\end{lstlisting}
In \figFull[code:ExprMatcher] is the full definition of the expression matcher, it is based upon a Depth-First Search in order to perform matching, and searches from the top of the code definition. In the first part of the function, if we are currently at the first AST node of \texttt{applicable to}, we recursively try to match every child of the current code node against the full \texttt{applicable to} definition, this ensures no matches are undiscovered while not performing unnecessary searching using a partial \texttt{applicable to} definition. If a child node returns one or more matches, they are placed into a temporary array, once all children have been searched, the partial matches are filtered out and all full matches are stored. This is all done before ever checking the node we are currently on. The reason for this is to avoid missing matches that reside further down in the current branch, and also ensure matches further down are placed earlier in the full match array, which makes it easier to perform transformation when collisions exist.
\subsection*{Matching a singular Expression/Statement template}
From line 22 in \figFull[code:ExprMatcher] we are evaluating if the current node is a full match. First the current node is checked against the node of \texttt{applicable to}, if said node is a match, and it contains enough children to be matched against \texttt{applicable to}, we know we can perform the search. Since the list of elements in the child arrays are ordered, we can do a search on each of the nodes by using the same index in the \texttt{node.children} array. If any of the children do not return as a match, we know this is not a full match and perform an early return of \texttt{undefined}. Only if at least all the children of \texttt{applicable to} return as a match can we determine this is a match. That match is then returned, and is stored by the caller of this current iteration of the recursion.
The method of writing the \texttt{applicable to} section using a singular simple expression/statement is by far the most versatile way of defining matching template, this is because there will be a higher probability of discovering applicable code with a template that is as generic and simple as possible. A very complex matching template with many statements or an expression containing many AST nodes will result in a lower chance of finding a resulting match in the users code. Therefore using simple, single root node matching templates provide the highest possibility of discovering a match within the users code.
\begin{lstlisting}[language={JavaScript}, label={code:ExprMatcher}, caption={Recursive definition of expression matcher}]
singleExprMatcher(
code: TreeNode<t.Node>,
aplTo: TreeNode<t.Node>
): TreeNode<PairedNodes> | undefined {
// If we are at start of ApplicableTo, start a new search on each of the child nodes
if (aplTo.element === this.aplToFull) {
// Perform a new search on all child nodes before trying to verify current node
let temp = [];
// If any matches bubble up from child nodes, we have to store it
for (let code_child of code.children) {
let maybeChildMatch = this.singleExprMatcher(code_child, aplTo);
if (maybeChildMatch) {
temp.push(maybeChildMatch);
}
}
this.matches.push(...temp);
}
Determining if we are currently trying to match with a template that is only a single expression/statement, we have to verify that the program body of the template has the length of 1, if it does we can use the singular expression matcher, if not, we have to rely on the matcher that can handle multiple statements at the head of the tree.
When matching an expression the first statement in the program body of the AST generated when using \cite{BabelGenerate}{babel generate} will be of type \texttt{ExpressionStatement}, the reason for this is Babel will treat free floating expressions as a statement, and place them into an ExpressionStatement. This will miss many applicable sections in the case of trying to match against a users code because expressions within other statements are not inside an ExpressionStatement. This will give a template that is incompatible with a lot of otherwise applicable expressions. This means the statement ExpressionStatement has to be removed, and the search has to be done with the expression as the top node of the template.
let curMatches = this.checkCodeNode(code.element, aplTo.element);
curMatches =
curMatches && code.children.length >= aplTo.children.length;
if (!curMatches) {
return;
}
// At this point current does match
// Perform a search on each of the children of both AplTo and Code.
let pairedCurrent: TreeNode<PairedNodes> = new TreeNode(null, {
codeNode: code.element,
aplToNode: aplTo.element,
});
for (let i = 0; i < aplTo.children.length; i++) {
let childSearch = this.singleExprMatcher(
code.children[i],
aplTo.children[i]
);
if (childSearch === undefined) {
// Failed to get a full match, so early return here
return;
}
childSearch.parent = pairedCurrent;
pairedCurrent.children.push(childSearch);
}
In the case of the singular node in the body of the template program being a Statement, no removal has to be done, as a Statement can be used directly.
// If we are here, a full match has been found
return pairedCurrent;
}
\end{lstlisting}
\subsubsection*{Recursively discovering matches}
To allow for easier transformation, and storage of what exact part of \texttt{applicable to} was matched against the exact node of the code AST, we use a custom instance of the simple tree structure described in \ref*{sec:BabelParse}, we use an interface \texttt{PairedNode}, this allows us to hold what exact nodes were matched together, this allows for a simpler transforming algorithm. The exact definition of \texttt{PairedNode} can be seen below
The matcher used against single Expression/Statement templates is based upon a Depth-First Search in order to perform matching, and searches for matches from the top of the code definition. It is important we try to match against the template at all levels of the code AST, this is done by starting a new search one every child node of the code AST if the current node of the template tree is the top node of the template. This ensures we have tried to perform a match at any level of the tree, this also means we do not get any partial matches, as we only store matches that are returned at the recursive call when we do the search from the first node of the template tree.
This is all done before ever checking the node we are currently on. The reason for this is to avoid missing matches that reside further down in the current branch, and also ensure matches further down are placed earlier in the full match array, which makes it easier to perform transformation when partial collisions exist.
Once we have started a search on all the child nodes of the current one using the full definition of \texttt{applicable to}, we can verify if we are currently exploring a match. This means the current node is checked against the current top node of \texttt{applicable to}, if said node is a match, based on what kind of match it is several different parts of the algorithm are called. This is because there are different forms of matches depending on if it is a match against a wildcard, a wildcard with \texttt{+}, or simply a node type match.
If the current node matches against a wildcard that does not use the \texttt{+} operator, we simply pair the current template node to the matched node from the users code and return. This is because whatever the current user node contains, it is being matched against a wildcard and that means no matter what is below it, it is meant to be placed directly into the transformation. Therefore we can determine that this is a match that is valid.
When the current node is matched against a wildcard that does use the \texttt{+} operator, we have to continue trying to match against that same wildcard with the sibling nodes of the current code node. This is performed in the recursive iteration above the current one, and therefore we also return the paired AST nodes of the template and the code, but we give the match result \texttt{MatchResult.MatchedWithPlussedWildcard} to the caller function. When the caller function gets this result, it will continue trying to match against the wildcard until it receives a different match result other than \texttt{MatchResult.MatchedWithPlussedWildcard}.
When the current node is matched based on the types of the current AST nodes, some parts have to hold. Namely, all child nodes of the template and the user code have to also return some form of match, this means if any of the child nodes currently return \texttt{MatchResult.NoMatch} the entire match is discarded. The number of child nodes of the current match also has to be equal. Due to wildcards this means we have to be able to match all child nodes of the user code to either a single node of the template, or a wildcard using the \texttt{+} operator.
If the current node does not match, we simply discard the current search, as we have already started a search from the start of the template at all levels of the user code AST, we can safely end the search and rely on these to find matches further down in the tree.
To allow for easier transformation, and storage of what exact part of \texttt{applicable to} was matched against the exact node of the code AST, we use a custom instance of the simple tree structure described in \ref*{sec:BabelParse}, we use an interface \texttt{PairedNode}, this allows us to hold what exact nodes were matched together, this allows for a simpler transforming algorithm. The exact definition of \texttt{PairedNode} can be seen below. The reason the codeNode is a list, is due to wildcards allowing for multiple AST nodes to match against, as they might match multiple nodes of the user code against a single node of the template.
\begin{lstlisting}[language={JavaScript}]
interface PairedNode{
codeNode: t.Node,
codeNode: t.Node[],
aplToNode: t.Node
}
\end{lstlisting}
@ -380,23 +366,10 @@ Using multiple statements in the template of \texttt{applicable to} will result
The initial step of this algorithm is to search through the AST for ast nodes that contain a list of \textit{Statements}, this can be done by searching for the AST nodes \textit{Program} and \textit{BlockStatement}, as these are the only valid places for a list of Statements to reside \cite{ECMA262Statement}. Searching the tree is quite simple, as all that is required is checking the type of every node recursively, and once a node that can contain multiple Statements, we check it for matches.
\begin{lstlisting}[language={JavaScript}]
multiStatementMatcher(code: TreeNode<t.Node>, aplTo: TreeNode<t.Node>) {
if (
code.element.type === "Program" ||
code.element.type === "BlockStatement"
) {
this.matchMultiHead(code.children, aplTo.children);
}
// Recursively search the tree for Program || BlockStatement
for (let code_child of code.children) {
this.multiStatementMatcher(code_child, aplTo);
}
}
\end{lstlisting}
Once a list of \textit{Statements} has been discovered, the function \texttt{matchMultiHead} can be executed with that block and the Statements of \texttt{applicable to}.
This function will use the technique \cite{SlidingWindow}{sliding window} to match multiple statements in order the same length as the list of statements are in \texttt{applicable to}. This sliding window will try to match each and every Statement against its corresponding Statement in the current \textit{BlockStatement}. When matching a singular Statements in the sliding window, a simple DFS recursion algorithm is applied, which is quite similar to the second part of matching a single Statement/Expr, however the main difference is that we do not search the entire AST tree, and if it matches it has to match fully and immediately. If a match is not found, the current iteration of the sliding window is discarded and we move on to the next iteration.
This function will use the technique \cite{SlidingWindow}{sliding window} to match multiple statements in order the same length as the list of statements are in \texttt{applicable to}. This sliding window will try to match every Statement against its corresponding Statement in the current \textit{BlockStatement}. When matching a singular Statements in the sliding window, a simple DFS recursion algorithm is applied, similar to algorithm used for matching a single expression/statement template, however the difference is that we do not search the entire AST tree, and if it matches it has to match fully and immediately. If a match is not found, the current iteration of the sliding window is discarded and we move on to the next iteration by moving the window one further.
One important case here is we might not know the width of the sliding window, this is due to wildcards using the \texttt{+}, as they can match one or more nodes against each other. These wildcards might match against \texttt{(Statement)+}. Therefore, we have to use a two point technique when iterating through the statements of the users code. As we might have to use the same statement from the template multiple times.
\subsection*{Output of the matcher}
@ -412,13 +385,19 @@ export interface Match {
\section{Transforming}
To perform the transformation and replacement on each of the matches, the tool uses the function \texttt{transformer}, this function takes all the matches found with the matcher, the code from \texttt{transform to} parsed by \cite{BabelParser}{babel/parser} and built into our custom tree structure, the original user code AST, and the full direct output of \texttt{transform to} parsed by babel/parser.
To perform the transformation and replacement on each of the matches, we take the resulting list of matches, the template from the \texttt{transform to} section of the current case of the proposal, and the AST version of original code parsed by Babel. All the transformations are then applied to the code and we use \cite{BabelGenerate}{Babel generate} to generate JavaScript code from the transformed AST.
An important discovery is to ensure we transform the leaves of the AST first, this is because if the transformation was applied from top to bottom, it might remove transformations done on a previous iteration of the matcher. This means if we transform from top to bottom on the tree, we might end up with \texttt{a(b) |> c(\%)} in stead of \texttt{b |> a(\%) |> c(\%)} in the case of the pipeline proposal. This is quite easily solved in our case, as the matcher looks for matches from the top of the tree to the bottom of the tree, the matches it discovers are always in that order. Therefore when transforming, all that has to be done is reverse the list of matches, to get the ones closest to the leaves of the tree first.
An important discovery is to ensure we transform the leaves of the AST first, this is because if the transformation was applied from top to bottom, it might remove transformations done using a previous match. This means if we transform from top to bottom on the tree, we might end up with \texttt{a(b) |> c(\%)} in stead of \texttt{b |> a(\%) |> c(\%)} in the case of the pipeline proposal. This is quite easily solved in our case, as the matcher looks for matches from the top of the tree to the bottom of the tree, the matches it discovers are always in that order. Therefore when transforming, all that has to be done is reverse the list of matches, to get the ones closest to the leaves of the tree first.
The first step of transforming is done by taking the wildcards used by the matcher and place them into the AST generated from \texttt{transform to}, in our case that means searching \texttt{transform to} and the paired output of the matcher for \textit{Identifiers} with the same name, and inserting the AST node matched against that specific \texttt{applicable to} node into the tree of \texttt{transform to} we are transforming. The version of \texttt{transform to} we are applying the transformation to is not in the custom tree structure used by the matcher, therefore we have to use \cite{BabelTraverse}{babel/traverse} to traverse it with a custom \cite{VisitorPattern}{visitor} only applying to AST nodes of type \textit{Identifier}. Once the correct identifier is found while traversing, the node is simply replaced with the node the wildcard was matched against in the matcher.
\subsubsection*{Preparing the transform to template}
Having a transformed version of the users code, it has to be inserted into the full AST definition of the users code, again we use \cite{BabelTraverse}{babel/traverse} to traverse the entirety of the AST using a visitor. This visitor does not apply to any node-type, as the matched section can be any type. Therefore we use a generic visitor, and use an equality check to find the exact part of the code this specific match comes from. Once we find where in the users code the match came from, we replace it with the transformed \texttt{transform to} nodes. This might be multiple Statements, therefore the function \texttt{replaceWithMultiple} is used, to insert every Statement from the \texttt{transform to} body. Now we simply have to remove the next n-1 Statements, where n is the length of the list of Statements in the current match.
The transformations are performed by inserting the matched wildcards from the applicable to template into their respective locations in the transform to template. Then the entire transform to template is placed into the original code AST where the match was discovered. Doing this we are essentially doing a transformation that is a find and replace with context passed through the wildcards.
In order to perform the transformation, all the sections matched against a wildcard have to be transferred into the \texttt{transform to} template. We utilize the functionality from Babel here and traverse the generated AST of the transform to template using \cite{BabelTraverse}{Babel traverse}, as this gives ut utility functions to replace, replace with many, and remove nodes of the AST. We use custom visitors for \textit{Identifier} and \textit{ExpressionStatement} with an Identifier as expression, in order to determine where the wildcard matches have to be placed, as they have to placed at the same location that shares a name with the wildcard. Once a shared identifier between the \texttt{transform to} template and the \texttt{applicable to} template is discovered, a babel traverse replace with multiple is performed and the node/s found in the match is inserted in place of the wildcard.
\subsubsection*{Inserting the template into the AST}
Having a transformed version of the users code, it has to be inserted into the full AST definition of the users code, again we use \cite{BabelTraverse}{babel/traverse} to traverse the entirety of the code AST using a visitor. This visitor does not apply to any node-type, as the matched section can be any type. Therefore we use a generic visitor, and use an equality check to find the exact part of the code this specific match comes from. Once we find where in the users code the match came from, we replace it with the transformed \texttt{transform to} nodes. This might be multiple Statements, therefore the function \texttt{replaceWithMultiple} is used, to insert every Statement from the \texttt{transform to} body, and we are careful to remove any following sibling nodes that were part of the original match. This is done by removing the \textit{n-1} next siblings from where we inserted the transform to template.
To generate JavaScript from the transformed AST created by this tool, we use a JavaScript library titled \cite{BabelGenerate}{babel/generator}. This library is specifically designed for use with Babel to generate JavaScript from a Babel AST. The transformed AST definition of the users code is transformed, while being careful to apply all Babel plugins the current proposal might require.

View file

@ -1,5 +1,44 @@
\chapter{Generated code from Protocol buffers}
\begin{lstlisting}[caption={Source code of something},label=Listing]
System.out.println("Hello Mars");
\begin{lstlisting}[language={JavaScript},caption={TypesScript types of Type Expression AST},label={ex:typeExpressionTypes}]
export interface Identifier extends WildcardNode {
nodeType: "Identifier";
name: string;
}
export interface Wildcard {
nodeType: "Wildcard";
identifier: Identifier;
expr: TypeExpr;
star: boolean;
}
export interface WildcardNode {
nodeType: "BinaryExpr" | "UnaryExpr" | "GroupExpr" | "Identifier";
}
export type TypeExpr = BinaryExpr | UnaryExpr | PrimitiveExpr;
export type BinaryOperator = "||" | "&&";
export type UnaryOperator = "!";
export interface BinaryExpr extends WildcardNode {
nodeType: "BinaryExpr";
left: UnaryExpr | BinaryExpr | PrimitiveExpr;
op: BinaryOperator;
right: UnaryExpr | BinaryExpr | PrimitiveExpr;
}
export interface UnaryExpr extends WildcardNode {
nodeType: "UnaryExpr";
op: UnaryOperator;
expr: PrimitiveExpr;
}
export type PrimitiveExpr = GroupExpr | Identifier;
export interface GroupExpr extends WildcardNode {
nodeType: "GroupExpr";
expr: TypeExpr;
}
\end{lstlisting}

View file

@ -24,7 +24,7 @@
\usetikzlibrary{shapes}
\renewcommand{\labelenumii}{\theenumii}
\renewcommand{\theenumii}{\theenumi.\arabic{enumii}.}
\usepackage{parcolumns}
\usepackage[hidelinks]{hyperref}
\tolerance=1000
\usepackage{amsmath}

View file

@ -95,3 +95,67 @@
note = {[Online; accessed 14. May 2024]},
url = {https://babeljs.io/docs/babel-Parser}
}
@inproceedings{AST1,
author = {Neamtiu, Iulian and Foster, Jeffrey S. and Hicks, Michael},
title = {Understanding source code evolution using abstract syntax tree matching},
year = {2005},
isbn = {1595931236},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/1083142.1083143},
doi = {10.1145/1083142.1083143},
abstract = {Mining software repositories at the source code level can provide a greater understanding of how software evolves. We present a tool for quickly comparing the source code of different versions of a C program. The approach is based on partial abstract syntax tree matching, and can track simple changes to global variables, types and functions. These changes can characterize aspects of software evolution useful for answering higher level questions. In particular, we consider how they could be used to inform the design of a dynamic software updating system. We report results based on measurements of various versions of popular open source programs. including BIND, OpenSSH, Apache, Vsftpd and the Linux kernel.},
booktitle = {Proceedings of the 2005 International Workshop on Mining Software Repositories},
pages = {15},
numpages = {5},
keywords = {abstract syntax trees, software evolution, source code analysis},
location = {St. Louis, Missouri},
series = {MSR '05}
}
@article{AST2,
author = {Neamtiu, Iulian and Foster, Jeffrey S. and Hicks, Michael},
title = {Understanding source code evolution using abstract syntax tree matching},
year = {2005},
issue_date = {July 2005},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {30},
number = {4},
issn = {0163-5948},
url = {https://doi.org/10.1145/1082983.1083143},
doi = {10.1145/1082983.1083143},
abstract = {Mining software repositories at the source code level can provide a greater understanding of how software evolves. We present a tool for quickly comparing the source code of different versions of a C program. The approach is based on partial abstract syntax tree matching, and can track simple changes to global variables, types and functions. These changes can characterize aspects of software evolution useful for answering higher level questions. In particular, we consider how they could be used to inform the design of a dynamic software updating system. We report results based on measurements of various versions of popular open source programs. including BIND, OpenSSH, Apache, Vsftpd and the Linux kernel.},
journal = {SIGSOFT Softw. Eng. Notes},
month = {may},
pages = {15},
numpages = {5},
keywords = {abstract syntax trees, software evolution, source code analysis}
}
@article{RecursiveDescent,
author = {Davis, Matthew S.},
title = {An object oriented approach to constructing recursive descent parsers},
year = {2000},
issue_date = {Feb.2000},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {35},
number = {2},
issn = {0362-1340},
url = {https://doi.org/10.1145/345105.345113},
doi = {10.1145/345105.345113},
abstract = {We discuss a technique to construct a recursive descent parser for a context free language using concepts found in object oriented design and implementation. A motivation for the technique is given. The technique is then introduced with snippets of a Smalltalk implementation. Some advantages and disadvantages of the technique are examined. Finally some areas of possible future work are discussed.},
journal = {SIGPLAN Not.},
month = {feb},
pages = {2935},
numpages = {7},
keywords = {Greibach normal form, context free grammar, design patterns, object oriented, recursive descent parser, smalltalk}
}