parsing_and_the_language_of_expression

PARSING AND THE LANGUAGE OF EXPRESSION TREES 
v.06


PARSING SOURCE CODE INTO TREES

The first step that almost all programming
languages do when running source code is to PARSE
the source code into a TREE data structure.
Programming languages can only work with code that
is in a completely unambiguous form, and trees
provide such a form. For example, in the following
line an expression is parsed into a tree
structure, the ' operator (which is discussed
below) holds this expression from being evaluated,
and the held tree is assigned to the variable
named "tree".

%mathpiper

tree := '((a+b)==(c*(d - e/f)));

%/mathpiper

    %output,preserve="false"
      Result: a + b == c*(d - e/f)
.   %/output




THE LANGUAGE OF TREES

A parsed expression tree can be viewed if desired.

%mathpiper

ViewTreeParts(tree, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




Expression trees consist of NODES connected by
ARCS. In the following diagram, all of the tree's
nodes are highlighted.

%mathpiper

ViewTreeParts(tree, NodesHighlight:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




In the following diagram, all of the tree's arcs
are highlighted.

%mathpiper

ViewTreeParts(tree, ArcsHighlight:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The node at the top of a tree is called its ROOT
node.

%mathpiper

ViewTreeParts(tree, RootNodeHighlight:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The nodes that are connected immediately below a
given node by arcs are its CHILD nodes. In the
following diagram, the root node has two CHILDREN,
and it is the PARENT node of these children.
Children nodes of the same parent node are called
SIBLING nodes.

%mathpiper

ViewTreeParts(tree, NodeHighlight:["1", "2"], Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




Nodes that don't have any children are called LEAF
nodes.

%mathpiper

ViewTreeParts(tree, LeafNodesHighlight:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




Each node below the root node in a tree is located
at a specific POSITION in the tree which is
determined by the arcs that need to be followed
from the root node to arrive at it. Each arc that
connects a parent node with is children is labeled
with a number. The leftmost child is labeled 1,
the sibling to its immediate right is labeled 2,
and so on. The following diagram shows the tree
with all of its positions indicated.

%mathpiper

ViewTreeParts(tree, ShowPositions:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The set of all children, children of children,
etc. of a given node are called its DESCENDANTS. A
node along with all of its descendants and the
arcs between them form a SUBTREE. The topmost node
in a subtree is the DOMINANT node, and all of its
descendants and arcs between them are said to be
DOMINATED by the dominant node. In the following
diagram, the subtree that has the node "-" as its
dominant node is highlighted.

%mathpiper

ViewTreeParts(tree, Pattern:(x_ - y_), Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The position of node "b" is "1,2". A sequence of
arcs and nodes is called a PATH, and a path that
goes from the root node to a leaf node is called a
BRANCH. The following diagram shows the branch
from the root node to node "b".

%mathpiper

ViewTreeParts(tree, Path:["", "1,2"], PathNumbers:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The position of node "-" is "2,2". The following
diagram shows the path from the root node to node
"-".

%mathpiper

ViewTreeParts(tree, Path:["", "2,2"], PathNumbers:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The LENGTH of a path is the number of arcs it
contains. For example, the length of the path
between node "a" and node "d" is 5.

%mathpiper

ViewTreeParts(tree, Path:["1,1", "2,2,1"], PathNumbers:False, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The length of a path from the root node to a given
node is called the DEPTH of the node, and the
length of the path from a node to the farthest
leaf it dominates is called its HEIGHT. The follow
diagram shows the depths of all the nodes in the
tree.

%mathpiper

ViewTreeParts(tree, ShowDepths:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




ACCESSING SUBTREES USING INDEX/POSITION NOTATION

MathPiper lists can either be created using the
"List" procedure or with [] square bracket
notation. The following code creates a list using
the "List" procedure.

In> List("a", "b", "c")

    %output,preserve="false"
      Result: ["a","b","c"]
.   %/output




The following code creates a nested list using
list square bracket notation.

In> list := ["a", "b", ["c", "d"], "e"];

    %output,preserve="false"
      Result: ["a","b",["c","d"],"e"]
.   %/output




Notice, however, that when a list in square
bracket notation is parsed into a tree, the tree
version uses "List" procedure notation to identify
it.

%mathpiper

ViewTreeParts(list, Scale:1.5, FontSize:20, ShowPositions:True, Code:True);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




Each element in a list can be accessed using
square bracket index notation (which has a
completely different use than square bracket list
notation). Notice how the index values of the list
elements in the following code match the position
values of the nodes that contain these values in
the tree.

%mathpiper

Print(list[1]);
Print(list[2]);
Print(list[3]);
Print(list[4]);

%/mathpiper

    %output,preserve="false"
      Result: True
      
      Side Effects:
      "a"
      "b"
      ["c","d"]
      "e"
      
.   %/output




The elements in the nested list can also be
accessed by thinking of their indexes as node
position numbers.

%mathpiper

Print(list[3][1]);
Print(list[3][2]);

%/mathpiper

    %output,preserve="false"
      Result: True
      
      Side Effects:
      "c"
      "d"
      
.   %/output




Up to this point MathPiper lists seem to be
similar to the lists that are in most modern
scripting languages, aside from their appearing to
start at index 1 while most modern scripting
languages start at index 0. I say "appearing to
start at index 1" because MathPiper lists actually
start at 0. Lets see what happens if we try to
access index 0 in our example list.

In> list[0]

    %output,preserve="false"
      Result: List
.   %/output




The above code accessed the "List" procedure
symbol that is in the root node of the tree. The
"List" procedure symbol of the nested list can be
accessed in a similar manner.

In> list[3][0]

    %output,preserve="false"
      Result: List
.   %/output



Lets go back to the mathematical expression we
were using earlier and see what happens if we use
list index notation on it. First, we will create a
view of this expression's tree that includes node
position numbers for easy reference.

%mathpiper

ViewTreeParts(tree, ShowPositions:True, Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




The position indicators in the view of the tree
can be used with index notation to easily access
any part of the tree, including all of the
operators.

In> tree[0]

    %output,preserve="false"
      Result: ==
.   %/output


In> tree[1]

    %output,preserve="false"
      Result: a + b
.   %/output


In> tree[2]

    %output,preserve="false"
      Result: c*(d - e/f)
.   %/output


In> tree[2][0]

    %output,preserve="false"
      Result: *
.   %/output


In> tree[2][2][1]

    %output,preserve="false"
      Result: d
.   %/output




PREVENTING EVALUATION OF A TREE

Sometimes instead of evaluating a tree, one wants
to prevent its evaluation and work with the tree
itself as data. The most common ways to achieve
this are to use the ' "hold" operator, the "Hold"
procedure, the '' "freeze" operator, or underscore 
constants.


THE ' "HOLD" OPERATOR AND THE "Hold" PROCEDURE

The ' operator prevents the expression to its
immediate right from being evaluated. For example,
when an unassigned variable is evaluated an
exception is thrown. However, an unassigned
variable to the immediate right of an ' operator
will be returned unevaluated.

In> Unassign(x);

    %output,preserve="false"
      Result: True
.   %/output



In> x

    %output,preserve="false"
      Result: The variable <x> does not have a value assigned to it.  Error near line 472 starting at index 4.
.   %/output



In> 'x

    %output,preserve="false"
      Result: x
.   %/output




The ' operator has a very high precedence, so its
operand is normally the smallest subexpression to
its immediate right. For example, in the following
expression, the operand of ' is only the variable
"x".

%mathpiper

Unassign(x);
Unassign(y);

'x + y;

%/mathpiper

    %error,mpversion=".230",preserve="false"
      Result: The variable <y> does not have a value assigned to it.  Error on or before line 500 starting at index 5.
.   %/error




The tree of the following expression shows that
the only part of the expression that ' considers
to be is operand is the variable "x". The "Hold"
procedure has the same behavior as the ' operator,
and it is used in this example to prevent the
expression from being evaluated so its tree can be
viewed.

%mathpiper

ViewTreeParts(Hold(x + y), Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




Parentheses need to be used to enable the '
operator to treat larger expressions as its
operand.

%mathpiper

ViewTreeParts('(x + y), Scale:1.5, FontSize:20);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output




THE '' "FREEZE" OPERATOR

The ' operator only holds during a single
evaluation and then it is removed from the
expression. This means that if an expression which
contains one or more ' operators (which are not
holding other ' operators) is evaluated, the
resulting expression will have these ' operators
removed from it.

If an expression needs to be held for an extended
period of time, the '' "freeze" operator can be
used. The '' operator consists of two single quote
characters placed next to each other with no space
between them. If a space is placed between these
characters, then the parser treats them as two
separate hold operators.

Other than holding evaluation of an expression
permanently, the main difference between the '
operator and the '' operator is that the ''
operator has a much lower precedence than the '
operator and so it can often be used to freeze
complex expressions without the need to use
parentheses.



UNDERSCORE CONSTANTS AND THE "ObjectoToMeta" AND
"MetaToObject" PROCEDURES

Any name in MathPiper code that contains an
underscore (_) character is treated as a constant
and is not evaluated.

%mathpiper

[
_x,
x_,
_hello,
hello_,
hello_there,
_
];

%/mathpiper

    %output,preserve="false"
      Result: [_x,x_,_hello,hello_,hello_there,_]
.   %/output





A convenient way to convert all of the names in an
expression into underscore constants is with the
"ObjectToMeta" procedure. Operators that end with
a $ are versions of their normal counterparts that
will not be evaluated.

In> code := ObjectToMeta('((a+b)==(c*(d - e / f))));

    %output,preserve="false"
      Result: _a +$ _b ==$ _c *$ (_d -$ _e /$ _f)
.   %/output




The "MetaToObject" procedure converts code that
contains underscore constants and $ characters
back into executable code.

In> MetaToObject(code)

    %output,preserve="false"
      Result: a + b == c*(d - e/f)
.   %/output




THE "MathParse" PROCEDURE

MathPiper syntax is not the same as traditional
mathematics syntax. For example, MathPiper's
standard parser does not understand that 2x means
2*x. The "MathParse" procedure accepts a string
which contains code that is in traditional
mathematics format, and it parses it into a
MathPiper tree data structure.

In> MathParse("2x")

    %output,preserve="false"
      Result: 2*_x
.   %/output




COMPUTER CODE IS ALSO PARSED INTO A TREE

The MathPiper parser also parses MathPiper source
code into a tree. In the following example the '
operator is used to hold the code that is in a
sequence expression so its tree form can be
viewed. In this tree all of the positions of the
nodes are shown along with the path from the root
node to the "++" operator.

%mathpiper

expression := 
''{
    result := [];
    
    index := 14;
    
    While(index <=? 33)
    {
        result := Append(result, index);
        
        index++;
    };
    
    result;
};

ViewTreeParts(expression, Scale:1.5, ShowPositions:True, IncludeExpression:False, Path:["", "1322"], PathNumbers:True, Code:True);

%/mathpiper

    %output,preserve="false"
      Result: 
.   %/output