Finished chapter 4

2024-05-28 21:59:51 +02:00 · 2024-05-28 21:59:51 +02:00 · e6d2de6495
commit e6d2de6495
parent d66a638d0b
2 changed files with 117 additions and 6 deletions
--- a/build/report.pdf
+++ b/build/report.pdf
--- a/chapter/ch4.tex
+++ b/chapter/ch4.tex
@ -534,23 +534,134 @@ export interface Match {

 \section{Transforming}

-To perform the transformation and replacement on each of the matches, we take the resulting list of matches, the template from the \texttt{transform to} section of the current case of the proposal, and the AST version of original code parsed by Babel. All the transformations are then applied to the code and we use~\cite{BabelGenerate}{Babel generate} to generate JavaScript code from the transformed AST. 
+To perform the transformation and replacement on each of the matches, we take the resulting list of matches, the template from the \texttt{transform to} section of the current case of the proposal, and the Babel AST~\cite{BabelAST} version of original code. All the transformations are then applied to the code and we use \texttt{@babel/generate}~\cite{BabelGenerate} to generate JavaScript code from the transformed AST. 

 An important discovery is to ensure we transform the leafs of the AST first, this is because if the transformation was applied from top to bottom, it might remove transformations done using a previous match. This means if we transform from top to bottom on the tree, we might end up with \texttt{a(b) |> c(\%)} in stead of \texttt{b |> a(\%) |> c(\%)} in the case of the pipeline proposal. This is quite easily solved in our case, as the matcher looks for matches from the top of the tree to the bottom of the tree, the matches it discovers are always in that order. Therefore when transforming, all that has to be done is reverse the list of matches, to get the ones closest to the leaves of the tree first.  

-\subsubsection{Inserting wildcard into transformation template}
+\subsubsection{Building the transformation}

-The transformations are performed by inserting the matched wildcards from the \texttt{applicable to} template into their respective locations in the \texttt{transform to} template. Then the entire transformed \texttt{transform to} template is placed into the original code AST where the root of the match was previously located. Doing this we are essentially doing a transformation that is a find and replace with context passed through the wildcards. 
+Before we can start to insert the \texttt{transform to} section into the user's code AST. We have to insert all nodes matched against a wildcard in \texttt{applicable to} into their reference locations. 

+The first step to achieve this is to extract the wildcards from the match tree. This is done by recursively searching the match tree for an \texttt{Identifier} or \texttt{ExpressionStatement} containing an \texttt{Identifier}. To do this, we have a function \texttt{extractWildcardPairs}, which takes a single match, and extracts all wildcards and places them into a \texttt{Map<string, t.Node[]>}. Where the key of the map is the identifier used for the wildcard, and the value is the AST nodes the wildcard was matched against in the users code. 

-First we have to extract every node that was matched against the wildcards in the match. To do this we recursively search through the match until we encounter an \texttt{Identifier} that shares a name with a wildcard. 
+\begin{lstlisting}[language={JavaScript}, caption={Extracting wildcard from match}, label={lst:extractWildcardFromMatch}]
+function extractWildcardPairs(match: Match): Map<string, t.Node[]> {
+    let map: Map<string, t.Node[]> = new Map();

-To insert all nodes matched against wildcards, we use \texttt{@babel/traverse}~\cite{BabelTraverse}, and traverse the AST of the \texttt{transform to} template. We use custom visitors for \textit{Identifier} and \textit{ExpressionStatement} with an \texttt{Identifier} as expression. Each visitor checks if the identifier is a registered wildcard, if it is, we perform a replacement of the \texttt{Identifier} with the node/s the wildcard was matched with. 
+    function recursiveSearch(node: TreeNode<PairedNodes>) {
+        let name: null | string = null;
+        if (node.element.aplToNode.type === "Identifier") {
+            name = node.element.aplToNode.name;
+        } else if (
+           // Node is ExpressionStatement with Identifier 
+        ) {
+            name = node.element.aplToNode.expression.name;
+        }

+        if (name) {
+            // Store in the map
+            map.set(name, node.element.codeNode);
+        }
+        // Recursively search the child nodes
+        for (let child of node.children) {
+            recursiveSearch(child);
+        }
+    }
+    // Start the initial search
+    for (let stmt of match.statements) {
+        recursiveSearch(stmt);
+    }
+    return map;
+}
+\end{lstlisting}
+
+Once the full map of all wildcards has been built, we have to insert the wildcards into the Babel AST of the \texttt{transform to} template. To do this, we have to traverse the template and insert the matched nodes of the user's code. We use \texttt{@babel/traverse}~\cite{BabelTraverse} to traverse the AST, as this provides us with a powerful API for modifying the AST. \texttt{@babel/traverse} allows us to define visitors, that are executed when traversing specific types of AST nodes. For this, we define a visitor for \texttt{Identifier}, and a visitor for \texttt{ExpressionStatement}. These visitors will do exactly the same, however for the \texttt{ExpressionStatement}, we have to check if the expression is an identifier. 
+
+When we visit a node that might be a wildcard, we check if that nodes name is in the map of wildcards built in Listing \ref{lst:extractWildcardFromMatch}. If the name of the identifier is a key in the wildcard, we get the value for that key, and perform a node replacement. Where we replace the identifier with the node from the user's code that was matched against that wildcard. See Listing \ref{lst:traToTransform}
+
+\begin{lstlisting}[language={JavaScript}, caption={Traversing \texttt{transform to} AST and inserting user context}, label={lst:traToTransform}]
+traverse(transformTo, {
+        Identifier: (path) => {
+            if (wildcardMatches.has(path.node.name)) {
+                let toReplaceWith = wildcardMatches.get(path.node.name);
+                if (toReplaceWith) {
+                    path.replaceWithMultiple(toReplaceWith);
+                }
+            }
+        },
+        ExpressionStatement: (path) => {
+            if (path.node.expression.type === "Identifier") {
+                let name = path.node.expression.name;
+                if (wildcardMatches.has(name)) {
+                    let toReplaceWith = wildcardMatches.get(name);
+                    if (toReplaceWith) {
+                        path.replaceWithMultiple(toReplaceWith);
+                    }
+                }
+            }
+        },
+    });
+\end{lstlisting}
+
+Due to some wildcards allowing matching of multiple sibling nodes, we have to use \texttt{replaceWithMultiple} when performing the replacement. This can be seen on line 6 and 16 of Listing \ref{lst:traToTransform}.  

 \subsubsection*{Inserting the template into the AST}

-Having a transformed version of the users code, it has to be inserted into the full AST definition of the users code, again we use~\cite{BabelTraverse}{babel/traverse} to traverse the entirety of the code AST using a visitor. This visitor does not apply to any node-type, as the matched section can be any type. Therefore we use a generic visitor, and use an equality check to find the exact part of the code this specific match comes from. Once we find where in the users code the match came from, we replace it with the transformed \texttt{transform to} nodes. This might be multiple statements, therefore the function \texttt{replaceWithMultiple} is used, to insert every Statement from the \texttt{transform to} body, and we are careful to remove any following sibling nodes that were part of the original match. This is done by removing the \textit{n-1} next siblings from where we inserted the transform to template.
+We have now created the \texttt{transform to} template with the user's context. This has to be inserted into the full AST definition of the users code. To do this we have to locate exactly where in the user AST this match originated. We can perform an equality check on the top noe of the user node stored in the match. To do this efficiently, we perform this check by using this top node as the key to a \texttt{Map}, so if a node in the user AST exists in that map, we know it was matched. 
+
+\begin{lstlisting}[language={JavaScript}]
+transformedTransformTo.set(
+    match.statements[0].element.codeNode[0],
+    [
+        transformMatchFaster(wildcardMatches, traToWithWildcards),
+        match,
+    ]
+);
+\end{lstlisting}
+
+
+To traverse the user AST, we use \texttt{@babel/traverse}~\cite{BabelTraverse}. In this case we cannot use a specific visitor, and therefore we use a generic visitor that applies to every node of the AST. If the current node we are visiting is a key to the map of transformations, we know we have to insert the transformed code. This is done similarly to before where we use \texttt{replaceWithMultiple}.
+
+Some matches have multiple root nodes. This is likely when matching was done with multiple statements as top nodes. This means we have to remove n-1 following sibling nodes. Removal of these sibling nodes can be seen on lines 12-15 of Listing \ref{lst:insertingIntoUserCode}. 
+
+\begin{lstlisting}[language={JavaScript}, caption={Inserting transformed matches into user code}, label={lst:insertingIntoUserCode}]
+traverse(codeAST, {
+        enter(path) {
+            if (transformedTransformTo.has(path.node)) {
+                let [traToWithWildcards, match] =
+                    transformedTransformTo.get(path.node) as [
+                        t.File,
+                        Match
+                    ];
+                path.replaceWithMultiple(
+                    traToWithWildcards.program.body);
+                
+                let siblings = path.getAllNextSiblings();
+
+                // For multi line applicable to
+                for (let i = 0; i < match.statements.length - 1; i++) {
+                    siblings[i].remove();
+                }
+
+                // When we have matched top statements with +, we might have to remove more siblings
+                for (let matchStmt of match.statements) {
+                    for (let codeStmt of matchStmt.element
+                        .codeNode) {
+                        let siblingnodes = siblings.map((a) => a.node);
+                        if (siblingnodes.includes(codeStmt)) {
+                            let index = siblingnodes.indexOf(codeStmt);
+                            siblings[index].remove();
+                        }
+                    }
+                }
+            }
+        },
+    });
+\end{lstlisting}
+
+There is a special case when a wildcard with a Keene plus, allowing the match of multiple siblings, means we might have more siblings to remove. In this case, it is not so simple to know exactly how many we have to remove. Therefore, we have to iterate over all statements of the match, and check if that statement is still a sibling of the current one being replace. This behavior can be seen on lines 20-29 of Listing \ref{lst:insertingIntoUserCode}.
+
+After one full traversal of the user AST. All matches found have been replaced with their respective transformation. All that remains is generating JavaScript from the transformed AST. 

 \subsubsection*{Generating source code from transformed AST}