Coming along nicely

2024-05-11 21:34:34 +02:00 · 2024-05-11 21:34:34 +02:00 · 01b826f19a
commit 01b826f19a
parent 8610e6456b
11 changed files with 294 additions and 7 deletions
--- a/.vscode/ch4_parts.txt
+++ b/.vscode/ch4_parts.txt
@ -0,0 +1,10 @@
+AST special types for custom functionality
+Custom wrapper data structure for AST
+Inserting valid JS to allow for parsing by babel. 
+Why the inserting is problematic
+JavaScript as its own Meta language (Solves the inserting)
+Parsing JS meta language into the same structure as JSTQL
+DFS search to match
+Parsing using <<>>
+
+Langium extracting JSON 
--- a/.vscode/meeting.txt
+++ b/.vscode/meeting.txt
@ -0,0 +1,19 @@
+Related work -> 6-10 pages :) 
+
+Table -> Select 3 to discuss in addition to both papers
+
+https://dl.acm.org/doi/10.1145/2414721.2414728
+
+
+https://dl.acm.org/doi/pdf/10.1145/2414721.2414728
+
+
+Aspect Oriented Programming
+
+
+Jetbrains structural search
+https://www.dropbox.com/scl/fi/q8bzwwlozn91qbrqnceen/ij-jetbrains-blog-post.pdf?rlkey=ovwk0vodrxfrn1z0f4bc3iiwj&e=1&dl=0
+
+https://sourcegraph.com/blog/going-beyond-regular-expressions-with-structural-code-search
+
+5 from this one ^
--- a/build/report.pdf
+++ b/build/report.pdf
--- a/chapter/background.tex
+++ b/chapter/background.tex
@ -0,0 +1 @@
+NEED TO DISCUSS SYNTACTIC SUGAR IN OTHER languages
--- a/chapter/ch3.tex
+++ b/chapter/ch3.tex
@ -5,6 +5,12 @@ syntactic proposals for EcmaScript.

 \section{The core idea}

+\textbf{THIS IS TOO ABRUPT OF AN INTRODUCTION, MORE GENERAL ALMOST REPEAT OF BACKGGROUND GOES HERE}
+
+\textbf{CURRENT VERSION vs FUTURE VERSION instead of old way}
+
+\textbf{DO NOT DISCUSS TOOL HERE}
+
 Users of EcmaScript have a familiarity with code they themselves have written. This means they have knowledge of how their own code works and why they might have written it a certain way. This project aims to utilize this pre-existing knowledge to showcase new proposals for EcmaScript. Showcasing proposals this way will allow users to focus on what the proposal actually entails, instead of focusing on the examples written by the proposal author. 

 Further in this chapter, we will be discussing the \textit{old} and \textit{new} way of programming in EcmaScript. What we are referring to in this case is with set of problems a proposal is trying to solve, if that proposal is allowed into EcmaScript as part of the language, there will be a \textit{new} way of solving said problems. The \textit{old} way is the current status quo when the proposal is not part of EcmaScript, and the \textit{new} way is when the proposal is part of EcmaScript and we are utilizing the new features of said proposal.  
@ -342,6 +348,7 @@ In the example \ref*{ex:awaitToPromise} we change \texttt{a} from async to synch
 In order to identify snippets of code in the users codebase where a proposal is applicable we need some way to define patterns of code where we can apply the proposal. To do this, a DSL titled \DSL is used. 

 \subsection{\DSL}
+\label{sec:DSL_DEF}

 In order to allow for the utilization of the users code. We have to identify snippets of the users code that some proposal is applicable to. In order to do this, we have designed a DSL called \DSL JavaScript Template Query Language. This DSL will contain the entire definition used to identify and transform user code in order to showcase a proposal. 

@ -385,6 +392,8 @@ const variableName = <<expr1>>;


 \subsection{Structure of \DSL}
+\label{sec:DSLStructure}
+

 \DSL is designed to mimic the examples already provided by a proposal champion in the proposals README. These examples can be seen in each of the proposals described in \ref{sec:proposals}. 

@ -529,5 +538,3 @@ proposal DoExpression{
 }

 \end{lstlisting}
-
-\chapter*{title}
--- a/chapter/ch4.tex
+++ b/chapter/ch4.tex
@ -1 +1,226 @@
-\chapter{ch 4 goes here}
+\chapter{Implementation}
+
+In this chapter, the implementation of the tool utilizing the \DSL and \DSLSH will be presented. It will describe the overall architecture of the tool, the flow of data throughout, and how the different stages of transforming user code are completed. 
+
+\section{Architecture}
+
+The architecture of the work described in this thesis is illustrated in \figFull[fig:architecture]
+
+In this tool, there exists two multiple ways to define a proposal, and each provide the same functionality, they only differ in syntax and writing-method.
+
+\begin{figure}
+\begin{center}
+\begin{tikzpicture}[
+    roundnode/.style={ellipse, draw=red!60, fill=red!5, very thick, minimum size=7mm},
+    squarednode/.style={rectangle, draw=red!60, fill=red!5, very thick, minimum size=5mm}
+]
+\node[roundnode]   (jstqlcode)        {JSTQL Code};
+\node[roundnode]   (selfhostedjsoninput)    [right=of jstqlcode]                        {Self-Hosted JSON};
+
+\node[squarednode] (langium)        [below=of jstqlcode]            {Langium Parser};
+\node[squarednode] (jsonparser)     [below=of selfhostedjsoninput]  {Self-Hosted JSON parser};
+\node[squarednode] (preludebuilder) [below=of jsonparser]           {Prelude Builder};
+\node[squarednode] (preParser)      [below=of langium]              {Pre-parser};
+\node[squarednode] (babel)          [below right=of preParser]      {Babel};
+\node[squarednode] (treebuilder)    [below=of babel]                {Custom AST builder};
+\node[squarednode] (matcher)        [below=of treebuilder]          {Matcher};
+\node[squarednode] (transformer)    [below=of matcher]              {Transformer};
+\node[squarednode] (joiner)         [below=of transformer]          {Joiner};
+
+
+\draw[->] (jstqlcode.south) -- (langium.north);
+\draw[->] (langium.south) -- (preParser.north);
+\draw[->] (preParser.south) |- (babel.west);
+\draw[->] (babel.south) -- (treebuilder.north);
+\draw[->] (treebuilder.south) -- (matcher.north);
+\draw[->] (matcher.south) -- (transformer.north);
+\draw[->] (transformer.south) -- (joiner.north);
+\draw[->] (selfhostedjsoninput.south) -- (jsonparser.north);
+\draw[->] (jsonparser.south) -- (preludebuilder.north);
+\draw[->] (preludebuilder.south) |- (babel.east);
+
+\end{tikzpicture}
+\end{center}
+
+\caption[Tool architecture]{Overview of tool architecture}
+\label{fig:architecture}
+\end{figure}
+\section{Parsing \DSL using Langium}
+
+In this section, the implementation of the parser for \DSL will be described. This section will outline the tool Langium, used as a parser-generator to create the AST used by the tool later to perform the transformations. 
+
+\subsection{Langium}
+
+Langium \cite{Langium} is primarily used to create parsers for Domain Specific Language, these kinds of parsers output an Abstract Syntax Tree that is later used to create interpreters or other tooling. In the case of \DSL we use Langium to generate TypeScript Objects that are later used as definitions for the tool to do matching and transformation of user code. 
+
+In order to generate this parser, Langium required a definition of a Grammar. A grammar is a set of instructions that describe a valid program. In our case this is a definition of describing a proposal, and its applicable to, transform to, descriptions. A grammar in Langium starts by describing the \texttt{Model}. The model is the top entry of the grammar, this is where the description of all valid top level statements. 
+
+In \DSL the only valid top level statement is the definition of a proposal. This means our language grammar model contains only one list, which is a list of 0 or many \texttt{Proposal} definitions. A Proposal definition is denoted by a block, which is denoted by \texttt{\{...\}} containing some valid definition. In the case of \DSL this block contains 1 or many definitions of \texttt{Pair}. 
+
+\texttt{Pair} is defined very similarly to \texttt{Proposal}, as it contains only a block containing a definition of a \texttt{Section}
+
+The \texttt{Section} is where a single case of some applicable code and its corresponding transformation is defined. This definition contains specific keywords do describe each of them, \texttt{applicable to} denotes a definition of some template \DSL uses to perform the matching algorithm. \texttt{transform to} contains the definition of code used to perform the transformation.
+ 
+In order to define exactly what characters/tokens are legal in a specific definition, Langium uses terminals defined using Regular Expressions, these allow for a very specific character-set to be legal in specific keys of the AST generated by the parser generated by Langium. In the definition of \texttt{Proposal} and \texttt{Pair} the terminal \texttt{ID} is used, this terminal is limited to allow for only words and can only begin with a character of the alphabet or an underscore. In \texttt{Section} the terminal \texttt{TEXT} is used, this terminal is meant to allow any valid JavaScript code and the custom DSL language described in \ref{sec:DSL_DEF}. Both these terminals defined allows Langium to determine exactly what characters are legal in each location. 
+
+\begin{lstlisting}[caption={Definition of \DSL in Langium}, label={def:JSTQLLangium}]
+grammar Jstql
+
+entry Model:
+    (proposals+=Proposal)*;
+
+Proposal:
+    'proposal' name=ID "{"
+        (pair+=Pair)+
+    "}";
+
+Pair:
+    "pair" name=ID "{"
+        aplTo=ApplicableTo
+        traTo=TraTo
+    "}";
+
+ApplicableTo:
+    "applicable" "to" "{"
+        apl_to_code=STRING
+    "}";
+TraTo:
+    "transform" "to" "{"
+        transform_to_code=STRING
+    "}";
+hidden terminal WS: /\s+/;
+terminal ID: /[_a-zA-Z][\w_]*/;
+terminal STRING: /"[^"]*"|'[^']*'/;
+\end{lstlisting}
+
+In the case of \DSL, we are not actually implementing a programming language meant to be executed. We are using Langium in order to generate an AST that will be used as a markup language, similar to YAML, JSON or TOML. The main reason for using Langium in such an unconventional way is Langium provides support for Visual Studio Code integration, and it solves the issue of parsing the definition of each proposal manually. However with only the grammar we cannot actually verify the wildcards placed in \texttt{apl\_to\_code} and \texttt{transform\_to\_code} are correctly written. This is done by using a feature of Langium called \texttt{Validator}. 
+
+
+\subsection*{Langium Validator}
+
+A Langium validator allows for further checks on the templates written withing \DSL, a validator allows for the implementation of specific checks on specific parts of the grammar. 
+
+\DSL does not allow empty typed wildcard definitions in \texttt{applicable to}, this means a wildcard cannot be untyped or allow any AST type to match against it. This is not possible to verify with the grammar, as inside the grammar the code is simply defined as a \texttt{STRING} terminal. This means further checks have to be implemented using code. In order to do this we have a specific \texttt{Validator} implemented on the \texttt{Pair} definition of the grammar. This means every time anything contained within a \texttt{Pair} is updated, the language server shipped with Langium will perform the validation step and report any errors.   
+
+The validator uses \texttt{Pair} as it's entry point, as it allows for a checking of wildcards in both \texttt{applicable to} and \texttt{transform to}, allowing for a check for if a wildcard identifier used in \texttt{transform to} exists in the definition of \texttt{applicable to}. 
+
+\begin{lstlisting}[language={JavaScript}]
+export class JstqlValidator {
+    validateWildcardAplTo(pair: Pair, accept: ValidationAcceptor): void {
+        try {
+            if (validationResultAplTo.errors.length != 0) {
+                accept("error", validationResultAplTo.errors.join("\n"), {
+                    node: pair.aplTo,
+                    property: "apl_to_code",
+                });
+            }
+            if (validationResultTraTo.length != 0) {
+                accept("error", validationResultTraTo.join("\n"), {
+                    node: pair.traTo,
+                    property: "transform_to_code",
+                });
+            }
+        } catch (e) {}
+    }
+}
+
+\end{lstlisting}
+
+\section{Pre-parsing}
+
+In order to refer to internal DSL variables defined in \texttt{applicable to} in the transformation, we need to extract this information from the template definitions and pass that on to 
+
+\subsection*{Pre-parsing \DSL}
+
+In order to allow the use of \cite[Babel]{Babel}, the wildcards present in the blocks of \texttt{applicable to} and \texttt{transform to} have to be parsed and replaced with some valid JavaScript. This is done by using a pre-parser that extracts the information from the wildcards and inserts an \texttt{Identifier} in their place. 
+
+In order to pre-parse the text, we look at each and every character in the code section, when a start token of a wildcard is discovered, which is denoted by \texttt{<<}, everything after that until the closing token, which is denoted by \texttt{>>}, is then treated as an internal DSL variable and will be stored by the tool. A variable \texttt{flag} is used, so when the value of flag is false, we know we are currently not inside a wildcard block, this allows us to just pass the character through to the variable \texttt{cleanedJS}. When \texttt{flag} is true, we know we are currently inside a wildcard block and we collect every character of the wildcard block into \texttt{temp}. Once we hit the end of the wildcard block, we pass temp on to the function \texttt{parseInternalString}
+
+\begin{lstlisting}[language={JavaScript}]
+export function parseInternal(code: string): InternalParseResult {
+    let cleanedJS = "";
+    let temp = "";
+    let flag = false;
+    let prelude: InternalDSLVariable = {};
+
+    for (let i = 0; i < code.length; i++) {
+        if (code[i] === "<" && code[i + 1] === "<") {
+            // From now in we are inside of the DSL custom block
+            flag = true;
+            i += 1;
+            continue;
+        }
+
+        if (flag && code[i] === ">" && code[i + 1] === ">") {
+            // We encountered a closing tag
+            flag = false;
+
+            let { identifier, types } = parseInternalString(temp);
+
+            cleanedJS += identifier;
+
+            prelude[identifier] = types;
+            i += 1;
+            temp = "";
+            continue;
+        }
+
+        if (flag) {
+            temp += code[i];
+        } else {
+            cleanedJS += code[i];
+        }
+    }
+    return { prelude, cleanedJS };
+}
+\end{lstlisting}
+Each wildcard will follow the exact same format, they begin with the opening token \texttt{<<}, followed by what name this variable will be referred by, this variable is called an internal DSL variable and will be used when transferring the matching AST node/s from the users code into the transform template. Following the internal DSL variable a \texttt{:} token is used to show we are moving onto the next part of the wildcard. Following this token is a list of DSL types, either 1 or many, that this wildcard can match against, separated by \texttt{|}.
+This is a very strict notation on how wildcards can be written, this avoids collision with the already reserved bit-shift operator in in JavaScript, as it is highly unlikely any code using the bit-shift operator would fit into this format of a wildcard. 
+
+\begin{lstlisting}
+<< Variable_Name : Type1 | Keyword | Type2 | Type3 >>
+\end{lstlisting}
+
+\begin{lstlisting}[language={JavaScript}]
+function parseInternalString(dslString: string) {
+    let [identifier, typeString] = dslString
+            .replace(/\s/g, "").split(":");
+
+    return {
+        identifier,
+        types: typeString.length > 0 ? typeString.split("|") : [""],
+    };
+}
+\end{lstlisting}
+
+\subsection*{Pre-parsing \DSLSH}
+
+The self-hosted version \DSLSH also requires some form of pre-parsing in order to prepare the internal DSL environment. This is relatively minor and only parsing directly with no insertion compared to \DSL. 
+
+In order to use JavaScript as the meta language to define JavaScript we define a \texttt{Prelude}. This prelude is required to consist of several \texttt{Declaration Statements} where the variable names are used as the internal DSL variables and right side expressions are used as the DSL types. In order to allow for multiple types to be allowed for a single internal DSL variable we re-use JavaScripts list definition. 
+
+We use Babel to generate the AST of the \texttt{prelude} definition, this allows us to get a JavaScript object structure. Since the structure is very strictly defined, we can expect every \texttt{stmt} of \texttt{stmts} to be a variable declaration, otherwise throw an error for invalid prelude. Continuing through the object we have to determine if the prelude definition supports multiple types, that is if it is either an \texttt{ArrayDeclaration} or just an \texttt{Identifier}. If it is an array we initialize the prelude with the name field of the \texttt{VariableDeclaration} to either an empty array and fill it with each element of the ArrayDeclaration or directly insert the single Identifier. 
+
+\begin{lstlisting}[language={JavaScript}]
+for (let stmt of stmts) {
+    // Error if not variableDeclaration
+    if (stmt.type == "VariableDeclaration") {
+        // If defined multiple valid types
+        if (stmt.init == "ArrayExpression") {
+            prelude[stmt.name] = []; // Empty array on declared
+            for (let elem of stmt.init.elements) {
+                // Add each type of the array def
+                prelude[stmt.name].push(elem);
+            }
+        } else {
+            // Single valid type
+            prelude[stmt.name] = [stmt.init.name];
+        }
+    }
+}
+
+\section{}
+
+
+\end{lstlisting}
+
--- a/chapter/comments
+++ b/chapter/comments
@ -0,0 +1,2 @@
+MOve order -> Simple -> Await -> Pipeline -> Do proposal -> Discard bindings
+
--- a/generators/commands.sty
+++ b/generators/commands.sty
@ -1,3 +1,6 @@
 \newcommand{\DSL}{JSTQL }
 \newcommand{\exProp}{\textbf{optional let to int for declaring numerical literal variables}}
 \newcommand{\discardBindings}{Discard Bindings }
+\newcommand{\DSLSH}{JSTQL-SH }
+\newcommand{\figFull}[1][missing]{Figure \ref{#1}}
+\newcommand{\fig}[1][missing]{\ref{#1}}
--- a/generators/imports.sty
+++ b/generators/imports.sty
@ -19,7 +19,9 @@
 \usepackage{latexsym}
 \DeclareRobustCommand{\VAN}[3]{#2}
 \usepackage{amssymb}
-
+\usepackage{tikz}
+\usetikzlibrary{positioning}
+\usetikzlibrary{shapes}
 \renewcommand{\labelenumii}{\theenumii}
 \renewcommand{\theenumii}{\theenumi.\arabic{enumii}.}

--- a/generators/refs.bib
+++ b/generators/refs.bib
@ -6,7 +6,7 @@
  year    = {2024},
  month   = apr,
  note    = {[Online; accessed 25. Apr. 2024]},
-  url     = {https://github.com/tc39/proposal-discard-binding?tab=readme-ov-file#object-binding-and-assignment-patterns}
+  url     = {https://github.com/tc39/proposal-discard-binding}
 }

@misc{Proposal:DoProposal,
@ -17,3 +17,20 @@
  note    = {[Online; accessed 2. May 2024]},
  url     = {https://github.com/tc39/proposal-do-expressions}
 }
+
+@misc{Langium,
+  title   = {{Langium}},
+  journal = {Langium},
+  year    = {2024},
+  month   = apr,
+  note    = {[Online; accessed 10. May 2024]},
+  url     = {https://langium.org}
+}
+
+@misc{Babel,
+  title = {{Babel {$\cdot$} Babel}},
+  year  = {2024},
+  month = may,
+  note  = {[Online; accessed 10. May 2024]},
+  url   = {https://babeljs.io}
+}
--- a/report.tex
+++ b/report.tex
@ -17,6 +17,7 @@
 \include{chapter/introduction}
 \include{chapter/ch2}
 \include{chapter/ch3}
+\include{chapter/ch4}
 % Include more chapters as required.
 %%=========================================

@ -35,7 +36,7 @@
 \clearpage
 \DeclareRobustCommand{\VAN}[3]{#3}
 \addcontentsline{toc}{chapter}{Bibliography}
-\bibliographystyle{generators/myplainnat}
+\bibliographystyle{plain}
 \bibliography{generators/refs}
 \appendix
 \titleformat{\chapter}[display]
				`@ -0,0 +1 @@`
				`NEED TO DISCUSS SYNTACTIC SUGAR IN OTHER languages`
				`@ -0,0 +1,2 @@`
				`MOve order -> Simple -> Await -> Pipeline -> Do proposal -> Discard bindings`