Introduction: The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. For a program to be semantically valid, all variables, functions, classes, etc. must be properly defined, expressions and variables must be used in ways that respect the type system, access control must be respected, and so forth. Semantic analysis is the front end’s penultimate phase and the compiler’s last chance to weed out incorrect programs. We need to ensure the program is sound enough to carry on to code generation.
A large part of semantic analysis consists of tracking variable/function/type declarations and type checking. In many languages, identifiers have to be declared before they’re used. As the compiler encounters a new declaration, it records the type information assigned to that identifier. Then, as it continues examining the rest of the program, it verifies that the type of an identifier is respected in terms of the operations being performed. For example, the type of the right side expression of an assignment statement should match the type of the left side, and the left side needs to be a properly declared and assignable identifier. The parameters of a function should match the arguments of a function call in both number and type. The language may require that identifiers be unique, thereby forbidding two global declarations from sharing the same name. Arithmetic operands will need to be of numeric—perhaps even the exact same type (no automatic int-to-double conversion, for instance). These are examples of the things checked in the semantic analysis phase.
Some semantic analysis might be done right in the middle of parsing. As a particular construct is recognized, say an addition expression, the parser action could check the two operands and verify they are of numeric type and compatible for this operation. In fact, in a one-pass compiler, the code is generated right then and there as well. In a compiler that runs in more than one pass (such as the one we are building for Decaf), the first pass digests the syntax and builds a parse tree representation of the program. A second pass traverses the tree to verify that the program respects all semantic rules as well. The single-pass strategy is typically more efficient, but multiple passes allow for better modularity and flexibility (i.e., can often order things arbitrarily in the source program).
Types and Declarations
We begin with some basic definitions to set the stage for performing semantic analysis. A type is a set of values and a set of operations operating on those values. There are three categories of types in most programming languages:
- Base types int, float, double, char, bool, etc. These are the primitive types provided directly by the underlying hardware. There may be a facility for user-defined variants on the base types (such as C enums).
- Compound types arrays, pointers, records, structs, unions, classes, and so on. These types are constructed as aggregations of the base types and simple compound types.
- Complex types lists, stacks, queues, trees, heaps, tables, etc. You may recognize these as abstract data types. A language may or may not have support for these sort of higher-level abstractions.
In many languages, a programmer must first establish the name and type of any data object (e.g., variable, function, type, etc). In addition, the programmer usually defines the lifetime. A declaration is a statement in a program that communicates this information to the compiler. The basic declaration is just a name and type, but in many languages it may include modifiers that control visibility and lifetime (i.e., static in C, private in Java).
Function declarations or prototypes serve a similar purpose for functions that variable declarations do for variables. Function and method identifiers also have a type, and the compiler can use ensure that a program is calling a function/method correctly. The compiler uses the prototype to check the number and types of arguments in function calls. The location and qualifiers establish the visibility of the function (Is the function global? Local to the module? Nested in another procedure? Attached to a class?) Type declarations (e.g., C type def, C classes) have similar behaviors with respect to declaration and use of the new typename.