Introduction to Abstract Syntax Trees in Go

Cover image

Revealing how Abstract Syntax Trees might ease building tools around source code processing in Go

by Arkadiusz Ziobrowski · 8 min read backend golang ast

Let's assume that we have to do a task to find all distinct package names in the list of Go files. How would you tackle it?

We could list the directory names containing the Go files using some shell commands. However, we may list incorrect packages, for example, if someone declares a package differing from the directory name.

We could also look for the lines beginning with the package keyword in the files with .go extension. Yet, we can still find some files with incorrect Go syntax, where the first line will not even parse.

How can you be sure that a file is correct and extract the needed information? Abstract Syntax Trees to the rescue!

What is an Abstract Syntax Tree?permalink

An Abstract Syntax Tree, abbreviated as AST, is a tree representation of source code structure in the given programming language. It does not contain every element that the correctly structured source code consists of, because, well, it's abstract.

The property of abstraction means that the underlying tree does not contain syntax-specific elements, such as parentheses or braces explicitly. They are there implicitly though because the source code that is invalid syntactically will not be translated into the Abstract Syntax Tree.

How is the Abstract Syntax Tree generated?

Abstract Syntax Tree is built upon the list of tokens that come from the Lexical Analysis phase of source code transformation into the tree structure (abstract syntax tree or even more strict concrete syntax tree).

In general, a transformation of the source code into the tree structure consists of two steps:

  1. Lexical Analysis (done by a lexer, sometimes called a scanner)
  2. Syntax Analysis (done by a parser)

Source code parsing process
Syntax Tree generation overview

Lexical Analysis breaks down the source code given as an input into the list of lexical tokens. Lexical tokens are the basic building blocks of the program, such as identifiers, keywords, comments, or literals. In the phase of the lexical analysis, it is not known how they fit together, meaning whether they form a function or a variable declaration, or even whether they do it correctly.

During the Syntax Analysis phase done by the parser, the lexical tokens are turned into the syntax tree. The rules of how the programming language describes some abstractions syntactically are defined by the language grammar. The parser tries to put the lexical tokens together into the syntax tree (either abstract or concrete, sometimes called a parse tree), if their placement obeys those rules. Some errors in the placement of lexical tokens may occur, that is why the syntax errors are raised by the compiler!

Nodes in the Abstract Syntax Tree

Go programming language supports multiple node types that represent Abstract Syntax Tree.

Every node contains information about the lexical token positions in the source code (this respects the position of braces and parentheses, which introduces some less abstract details to the AST structure).

There are four general types of nodes, that are grouped by the following interfaces:

  • Decl for declaration nodes, e.g. FuncDecl for function declarations,
  • Expr for expression nodes, e.g. BinaryExpr for binary expressions (like addition),
  • Stmt for statement nodes, e.g. AssignStmt for assignments,
  • Spec for specification nodes, e.g. ValueSpec for constant or variable declaration specifications,
  • Node is a general interface that is implemented by every node in an AST.

Every node type supported by the Go standard library go/ast package can be found under the following link: https://golang.org/pkg/go/ast

Sample AST
Example Abstract Syntax Tree of the main function

Abstract Syntax Tree shown above is a visualization of the following source code snippet in Go programming language.

func main(){
print("Hello, world!")
}

Every node in the portrayed Abstract Syntax Tree is augmented with information about the details of the given node type, for example, the *ast.Ident node contains the information about the name of the declared function.

Thanks to the tree structure of Abstract Syntax Tree, it is straightforward to extract only the information about a certain node type. Most libraries utilize Visitor design pattern for AST traversal. It abstracts away (no pun intended) the complexities of traversing only the nodes of a certain class - for example only the function declarations.

What can you do with an AST?permalink

With Abstract Syntax Tree you can implement complex source code inspections and manipulations. Let's focus on some real-world examples.

Parsing the source code

To create an AST from the Go source code, you need to first parse a given file.

fset := token.NewFileSet()
filename := "example.go"

f, err := parser.ParseFile(fset, filename, nil, parser.AllErrors)
if err != nil {
return nil, err
}

You can also supply the source code for a file as a string and use it instead of the filename (in place of nil in the example snippet).

You can parse the whole directory containing files with parser.ParseDir or parse only a single expression with parser.ParseExpr. In this example, we used the parser.AllErrors parsing mode that will report all parsing errors. Go standard library parser supports the following parsing modes.

const (
PackageClauseOnly Mode = 1 << iota // stop parsing after package clause
ImportsOnly // stop parsing after import declarations
ParseComments // parse comments and add them to AST
Trace // print a trace of parsed productions
DeclarationErrors // report declaration errors
SpuriousErrors // same as AllErrors, for backward-compatibility
AllErrors = SpuriousErrors // report all errors (not just the first 10 on different lines)
)

Now you have an AST ready to be used!

Printing an AST

Let's inspect an AST built with the following snippet - you can see how the verbatim source code input works here as mentioned in the previous section.

code := `
package main
func main() {
print("Hello, world!")
}`


fset := token.NewFileSet()
f, err := parser.ParseFile(fset, "", code, parser.AllErrors)
if err != nil {
panic(err)
}

err = ast.Print(fset, f)
if err != nil {
panic(err)
}

Executing this snippet will print the tree to the standard output.

0  *ast.File {
1 . Package: 2:4
2 . Name: *ast.Ident {
3 . . NamePos: 2:12
4 . . Name: "main"
5 . }
6 . Decls: []ast.Decl (len = 1) {
7 . . 0: *ast.FuncDecl {
8 . . . Name: *ast.Ident {
9 . . . . NamePos: 3:9
10 . . . . Name: "main"
11 . . . . Obj: *ast.Object {
12 . . . . . Kind: func
13 . . . . . Name: "main"
14 . . . . . Decl: *(obj @ 7)
15 . . . . }
16 . . . }
17 . . . Type: *ast.FuncType {
18 . . . . Func: 3:4
19 . . . . Params: *ast.FieldList {
20 . . . . . Opening: 3:13
21 . . . . . Closing: 3:14
22 . . . . }
23 . . . }
24 . . . Body: *ast.BlockStmt {
25 . . . . Lbrace: 3:16
26 . . . . List: []ast.Stmt (len = 1) {
27 . . . . . 0: *ast.ExprStmt {
28 . . . . . . X: *ast.CallExpr {
29 . . . . . . . Fun: *ast.Ident {
30 . . . . . . . . NamePos: 4:5
31 . . . . . . . . Name: "print"
32 . . . . . . . }
33 . . . . . . . Lparen: 4:10
34 . . . . . . . Args: []ast.Expr (len = 1) {
35 . . . . . . . . 0: *ast.BasicLit {
36 . . . . . . . . . ValuePos: 4:11
37 . . . . . . . . . Kind: STRING
38 . . . . . . . . . Value: "\"Hello, world!\""
39 . . . . . . . . }
40 . . . . . . . }
41 . . . . . . . Ellipsis: -
42 . . . . . . . Rparen: 4:26
43 . . . . . . }
44 . . . . . }
45 . . . . }
46 . . . . Rbrace: 5:4
47 . . . }
48 . . }
49 . }
50 . Scope: *ast.Scope {
51 . . Objects: map[string]*ast.Object (len = 1) {
52 . . . "main": *(obj @ 11)
53 . . }
54 . }
55 . Unresolved: []*ast.Ident (len = 1) {
56 . . 0: *(obj @ 29)
57 . }
58 }

The additional information about the positions of the nodes in the source code and other details of nodes can be easily inspected.

Traversing an AST

You can traverse the given AST node types using the ast.Inspect function. It traverses each node of the tree, so you have to typecast the node to check whether the current node is of the type, that you want.

ast.Inspect(f, func(n ast.Node) bool {
if fd, ok := n.(*ast.FuncDecl); ok {
fmt.Println("Found function declaration: ", fd.Name)
}
return true
})

The snippet above finds all function declarations in the AST and prints the names of the obtained functions.

Manipulating an AST

We can use the AST to manipulate the underlying source code. For example, we can rearrange the order of the statements, add additional lines or change the names of the variables and functions. However, ASTs manipulation lets us do much more.

Let's create a simple source code manipulation that will rearrange the order of two variable declarations in a file called example.go.

package main

func main(){
var a int // foo
var b string // bar
}

We can achieve this by executing the following snippet.

fset := token.NewFileSet()
f, err := parser.ParseFile(fset, "example.go", nil, parser.AllErrors)
if err != nil {
panic(err)
}
list := f.Decls[0].(*ast.FuncDecl).Body.List
list[0], list[1] = list[1], list[0]

if err := format.Node(os.Stdout, fset, f); err != nil {
panic(err)
}

Properly rearranged (and still syntactically valid!) source code will get printed to the standard output.

package main

func main() {

var b string
var a int

}

However, the generated code is incorrect. Where did the comments go?

The comments were omitted in the process of AST building, because of the mode that we have used - parser.AllErrors . To preserve the comments we would need to use parser.ParseComments mode. Let's try this out! We will modify the parser.ParseFile expression.

f, err := parser.ParseFile(fset, "", code, parser.ParseComments)

Now it contains the comments, but they are misplaced:

package main

func main() {
// foo
var b string
var a int
// bar
}

This is caused by the way the information about the position of the given free-floating comment node in the source code is represented in the standard library AST. A relative offset-based encoding is used for that, which makes it not trivial (yet still doable) to manipulate [1].

There is even an open issue in the golang/go repository that is describing that behavior.

It can be found here: https://github.com/golang/go/issues/20744

Free-floating comments issue
Free-floating comments issue in go/ast repository

The standard library in Go lacks the capability to mutate the AST with retaining the placement of the free-floating comments in the node structure. Fortunately, there is a library being the remedy for this issue.

Decorating an AST

It might sound festive, but the process of decoration of the Abstract Syntax Tree lets one properly preserve the placement of the free-floating comments in the underlying source code after its manipulation.

We can use the github.com/dave/dst library to use the Decorated Syntax Tree in our solution.

package main

import (
"github.com/dave/dst"
"github.com/dave/dst/decorator"
)

func main() {
code := `package main

func main(){
var a int // foo
var b string // bar
}
`

f, err := decorator.Parse(code)
if err != nil {
panic(err)
}

list := f.Decls[0].(*dst.FuncDecl).Body.List
list[0], list[1] = list[1], list[0]

if err := decorator.Print(f); err != nil {
panic(err)
}
}

The dst library works as a drop-in replacement for the standard library go/ast package. It will give us the proper placement of the free-floating comments.

package main

func main() {
var b string // bar
var a int // foo
}

How long is this?permalink

Let's dive into the fully-fledged example of these applications. We will create a simple source code product metric called Lines of Code (LOC), that measures the length of the function in lines.

At Ingrid, we are having once-a-sprint hacking days, when you are free to expand your knowledge on your subject of choice or have fun with any tech you fancy to. It yields a lot of amazing tools and even more brown bags, that are held weekly. If you ever wanted to build something cool, you are highly encouraged to do so!

First, we will need to parse a source code and create an AST. For brevity, we will use the hardcoded snippet. Next, we will have to traverse the AST, visiting only the *ast.FuncDecl nodes, that are representing the function declarations in the source code.

code := `package main

func main(){
var a int // foo
var b string // bar
}
`

fset := token.NewFileSet()
f, err := parser.ParseFile(fset, "", code, parser.ParseComments)
if err != nil {
panic(err)
}

ast.Inspect(f, func(n ast.Node) bool {
if fd, ok := n.(*ast.FuncDecl); ok {
length, err := getFunctionBodyLength(fd, fset)
if err != nil{
panic(err)
}
fmt.Printf("%s -> LOC: %d", fd.Name, length)
}
return true
})

We can see that the LOC metric at the function level is computed by the getFunctionBodyLength function.

func getFunctionBodyLength(f *ast.FuncDecl, fs *token.FileSet) (int, error) {
if fs == nil {
return 0, errors.New("FileSet is nil")
}
if f.Body == nil {
return 0, nil
}
if !f.Body.Lbrace.IsValid() || !f.Body.Rbrace.IsValid() {
return 0, fmt.Errorf("function %s is not syntactically valid", f.Name.String())
}
length := fs.Position(f.Body.Rbrace).Line - fs.Position(f.Body.Lbrace).Line - 1
if length > 0 {
return length, nil
}
return 0, nil
}

It utilizes the retained information about the positions of lexical tokens in the source code. It bases on the offset encoding used by the Go standard library AST.

Executing the following snippet will print the result to the standard output.

main -> LOC: 2

Now we have a simple implementation of the LOC metric using the AST (and no third-party dependencies).

Wrapping uppermalink

Looking back at our task described at the beginning of this post, we can use the Abstract Syntax Tree to find all the Go packages used in the list of Go files.

In this blog post, we did not cover the part of working with Abstract Syntax Tree related to type resolution. Stay tuned!


  1. More on the offset based encoding can be found in the go/token package reference: https://golang.org/pkg/go/token/#Pos↩︎


Cover photo by @zachplank on Unsplash

Does Ingrid sound like an interesting place to work at? We are always looking for good people! Check out our open positions