PARSING AND THE LANGUAGE OF EXPRESSION TREES v.06 PARSING SOURCE CODE INTO TREES The first step that almost all programming languages do when running source code is to PARSE the source code into a TREE data structure. Programming languages can only work with code that is in a completely unambiguous form, and trees provide such a form. For example, in the following line an expression is parsed into a tree structure, the ' operator (which is discussed below) holds this expression from being evaluated, and the held tree is assigned to the variable named "tree". %mathpiper tree := '((a+b)==(c*(d - e/f))); %/mathpiper %output,preserve="false" Result: a + b == c*(d - e/f) . %/output THE LANGUAGE OF TREES A parsed expression tree can be viewed if desired. %mathpiper ViewTreeParts(tree, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output Expression trees consist of NODES connected by ARCS. In the following diagram, all of the tree's nodes are highlighted. %mathpiper ViewTreeParts(tree, NodesHighlight:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output In the following diagram, all of the tree's arcs are highlighted. %mathpiper ViewTreeParts(tree, ArcsHighlight:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The node at the top of a tree is called its ROOT node. %mathpiper ViewTreeParts(tree, RootNodeHighlight:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The nodes that are connected immediately below a given node by arcs are its CHILD nodes. In the following diagram, the root node has two CHILDREN, and it is the PARENT node of these children. Children nodes of the same parent node are called SIBLING nodes. %mathpiper ViewTreeParts(tree, NodeHighlight:["1", "2"], Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output Nodes that don't have any children are called LEAF nodes. %mathpiper ViewTreeParts(tree, LeafNodesHighlight:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output Each node below the root node in a tree is located at a specific POSITION in the tree which is determined by the arcs that need to be followed from the root node to arrive at it. Each arc that connects a parent node with is children is labeled with a number. The leftmost child is labeled 1, the sibling to its immediate right is labeled 2, and so on. The following diagram shows the tree with all of its positions indicated. %mathpiper ViewTreeParts(tree, ShowPositions:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The set of all children, children of children, etc. of a given node are called its DESCENDANTS. A node along with all of its descendants and the arcs between them form a SUBTREE. The topmost node in a subtree is the DOMINANT node, and all of its descendants and arcs between them are said to be DOMINATED by the dominant node. In the following diagram, the subtree that has the node "-" as its dominant node is highlighted. %mathpiper ViewTreeParts(tree, Pattern:(x_ - y_), Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The position of node "b" is "1,2". A sequence of arcs and nodes is called a PATH, and a path that goes from the root node to a leaf node is called a BRANCH. The following diagram shows the branch from the root node to node "b". %mathpiper ViewTreeParts(tree, Path:["", "1,2"], PathNumbers:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The position of node "-" is "2,2". The following diagram shows the path from the root node to node "-". %mathpiper ViewTreeParts(tree, Path:["", "2,2"], PathNumbers:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The LENGTH of a path is the number of arcs it contains. For example, the length of the path between node "a" and node "d" is 5. %mathpiper ViewTreeParts(tree, Path:["1,1", "2,2,1"], PathNumbers:False, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The length of a path from the root node to a given node is called the DEPTH of the node, and the length of the path from a node to the farthest leaf it dominates is called its HEIGHT. The follow diagram shows the depths of all the nodes in the tree. %mathpiper ViewTreeParts(tree, ShowDepths:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output ACCESSING SUBTREES USING INDEX/POSITION NOTATION MathPiper lists can either be created using the "List" procedure or with [] square bracket notation. The following code creates a list using the "List" procedure. In> List("a", "b", "c") %output,preserve="false" Result: ["a","b","c"] . %/output The following code creates a nested list using list square bracket notation. In> list := ["a", "b", ["c", "d"], "e"]; %output,preserve="false" Result: ["a","b",["c","d"],"e"] . %/output Notice, however, that when a list in square bracket notation is parsed into a tree, the tree version uses "List" procedure notation to identify it. %mathpiper ViewTreeParts(list, Scale:1.5, FontSize:20, ShowPositions:True, Code:True); %/mathpiper %output,preserve="false" Result: . %/output Each element in a list can be accessed using square bracket index notation (which has a completely different use than square bracket list notation). Notice how the index values of the list elements in the following code match the position values of the nodes that contain these values in the tree. %mathpiper Print(list[1]); Print(list[2]); Print(list[3]); Print(list[4]); %/mathpiper %output,preserve="false" Result: True Side Effects: "a" "b" ["c","d"] "e" . %/output The elements in the nested list can also be accessed by thinking of their indexes as node position numbers. %mathpiper Print(list[3][1]); Print(list[3][2]); %/mathpiper %output,preserve="false" Result: True Side Effects: "c" "d" . %/output Up to this point MathPiper lists seem to be similar to the lists that are in most modern scripting languages, aside from their appearing to start at index 1 while most modern scripting languages start at index 0. I say "appearing to start at index 1" because MathPiper lists actually start at 0. Lets see what happens if we try to access index 0 in our example list. In> list[0] %output,preserve="false" Result: List . %/output The above code accessed the "List" procedure symbol that is in the root node of the tree. The "List" procedure symbol of the nested list can be accessed in a similar manner. In> list[3][0] %output,preserve="false" Result: List . %/output Lets go back to the mathematical expression we were using earlier and see what happens if we use list index notation on it. First, we will create a view of this expression's tree that includes node position numbers for easy reference. %mathpiper ViewTreeParts(tree, ShowPositions:True, Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output The position indicators in the view of the tree can be used with index notation to easily access any part of the tree, including all of the operators. In> tree[0] %output,preserve="false" Result: == . %/output In> tree[1] %output,preserve="false" Result: a + b . %/output In> tree[2] %output,preserve="false" Result: c*(d - e/f) . %/output In> tree[2][0] %output,preserve="false" Result: * . %/output In> tree[2][2][1] %output,preserve="false" Result: d . %/output PREVENTING EVALUATION OF A TREE Sometimes instead of evaluating a tree, one wants to prevent its evaluation and work with the tree itself as data. The most common ways to achieve this are to use the ' "hold" operator, the "Hold" procedure, the '' "freeze" operator, or underscore constants. THE ' "HOLD" OPERATOR AND THE "Hold" PROCEDURE The ' operator prevents the expression to its immediate right from being evaluated. For example, when an unassigned variable is evaluated an exception is thrown. However, an unassigned variable to the immediate right of an ' operator will be returned unevaluated. In> Unassign(x); %output,preserve="false" Result: True . %/output In> x %output,preserve="false" Result: The variable <x> does not have a value assigned to it. Error near line 472 starting at index 4. . %/output In> 'x %output,preserve="false" Result: x . %/output The ' operator has a very high precedence, so its operand is normally the smallest subexpression to its immediate right. For example, in the following expression, the operand of ' is only the variable "x". %mathpiper Unassign(x); Unassign(y); 'x + y; %/mathpiper %error,mpversion=".230",preserve="false" Result: The variable <y> does not have a value assigned to it. Error on or before line 500 starting at index 5. . %/error The tree of the following expression shows that the only part of the expression that ' considers to be is operand is the variable "x". The "Hold" procedure has the same behavior as the ' operator, and it is used in this example to prevent the expression from being evaluated so its tree can be viewed. %mathpiper ViewTreeParts(Hold(x + y), Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output Parentheses need to be used to enable the ' operator to treat larger expressions as its operand. %mathpiper ViewTreeParts('(x + y), Scale:1.5, FontSize:20); %/mathpiper %output,preserve="false" Result: . %/output THE '' "FREEZE" OPERATOR The ' operator only holds during a single evaluation and then it is removed from the expression. This means that if an expression which contains one or more ' operators (which are not holding other ' operators) is evaluated, the resulting expression will have these ' operators removed from it. If an expression needs to be held for an extended period of time, the '' "freeze" operator can be used. The '' operator consists of two single quote characters placed next to each other with no space between them. If a space is placed between these characters, then the parser treats them as two separate hold operators. Other than holding evaluation of an expression permanently, the main difference between the ' operator and the '' operator is that the '' operator has a much lower precedence than the ' operator and so it can often be used to freeze complex expressions without the need to use parentheses. UNDERSCORE CONSTANTS AND THE "ObjectoToMeta" AND "MetaToObject" PROCEDURES Any name in MathPiper code that contains an underscore (_) character is treated as a constant and is not evaluated. %mathpiper [ _x, x_, _hello, hello_, hello_there, _ ]; %/mathpiper %output,preserve="false" Result: [_x,x_,_hello,hello_,hello_there,_] . %/output A convenient way to convert all of the names in an expression into underscore constants is with the "ObjectToMeta" procedure. Operators that end with a $ are versions of their normal counterparts that will not be evaluated. In> code := ObjectToMeta('((a+b)==(c*(d - e / f)))); %output,preserve="false" Result: _a +$ _b ==$ _c *$ (_d -$ _e /$ _f) . %/output The "MetaToObject" procedure converts code that contains underscore constants and $ characters back into executable code. In> MetaToObject(code) %output,preserve="false" Result: a + b == c*(d - e/f) . %/output THE "MathParse" PROCEDURE MathPiper syntax is not the same as traditional mathematics syntax. For example, MathPiper's standard parser does not understand that 2x means 2*x. The "MathParse" procedure accepts a string which contains code that is in traditional mathematics format, and it parses it into a MathPiper tree data structure. In> MathParse("2x") %output,preserve="false" Result: 2*_x . %/output COMPUTER CODE IS ALSO PARSED INTO A TREE The MathPiper parser also parses MathPiper source code into a tree. In the following example the ' operator is used to hold the code that is in a sequence expression so its tree form can be viewed. In this tree all of the positions of the nodes are shown along with the path from the root node to the "++" operator. %mathpiper expression := ''{ result := []; index := 14; While(index <=? 33) { result := Append(result, index); index++; }; result; }; ViewTreeParts(expression, Scale:1.5, ShowPositions:True, IncludeExpression:False, Path:["", "1322"], PathNumbers:True, Code:True); %/mathpiper %output,preserve="false" Result: . %/output