The Course Syllabus

Lectures are split between traditional presentations, in which the instructor explains some topic of interest, and paper discussions, in which all the students (plus the instructor) debate some recent paper. The slides used in the traditional presentations are available in this page. The course is divided into two parts just because that makes it easier to evaluate the students: we have two exams. The main course events are listed below. For the full calendar, check this link. Due to the pandemics, classes will be online. Most of the classes are asynchronous, but there will be seven online meetings.

First Part

Slides Exercises Description
Introduction: a bit of everything. Why to study compilers, the history behind them, how compiler optimizations work, etc, etc, etc. This class also gives a general overview about the entire course. In addition to the exercises, this time we have a few slides from the ACE group, a company that sells compiler technology! This class can be watched on YouTube: Part 1, Part 2, Part 3, and Part 4. Discussion question Q1.
Control Flow Graphs: in this class we describe the most fundamental program representation used by the compiler: the control flow graph. We define basic blocks, and go over local optimizations: code optimization techniques that work exclusively within the boundaries of single basic blocks. In this context, we talk about the use of directed acyclic graphs to represent basic blocks, about peephole optimizations and about local register allocation. This class can be watched on YouTube: Part 1, Part 2, Part 3, and Part 4. Discussion question Q1 and Q2.
Introduction to Dataflow Analyses: in this class we review the basic principles that support dataflow analyses. We take a look into the different types of analyses, e.g., may/must and forward/backwards. The class is mostly based on examples, which illustrate these principles and differences. Videos for this class: Part 1, Part 2, Part 3, Part 4, and Part 5. Discussion question Q1 and Q2.
Algorithms to Solve Dataflow Analyses: in this class we go over different strategies to solve dataflow analyses efficiently. We study the worklist algorithm, which is a baseline design that can deal with general dataflow analyses, and then refine this design, observing how a better ordering of the constraints to be solved can have a positive influence on the performance of the algorithm. Watch the videos about this class: Part 1, Part 2, Part 3, Part 4, and Part 5. Discussion question Q1.
Introduction to Lattice Theory: in this class we take a look into lattices. Lattices constitute the core of the theory supporting dataflow analyses. We discuss why the worklist algorithm seen previously terminates for many different types of dataflow analyses. We introduce and explain the notion of fixed points, and show why it is so fundamental to static programa analyses. Videos are available on YouTube: Part 1, Part 2, Part 3, and Part 4. Discussion question Q1.
Partial Redundancy Elimination: in this class we go over this classic compiler optimization, focusing on lazy code motion. Partial Redundancy Elimination (PRE) has the virtue of subsumming many different dataflow analyses, joining in a single algorithm the effects of different compiler optimizations. First video, and second video. Discussion question Q1.
Constraint Based Analyses: in this class we discuss a different type of static program analyses: the contraint based approach. Analyses that fit into this category can be used to deal with problems that do not assume any ordering between the statements of a program. A typical example of such a problem is the inference of a control flow graph for a program coded in a language with dynamic dispatch. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
Pointer Analysis: this class presents one of the most important and well-know types of constraint based analyses: the pointer analysis. We illustrate the need for knowing information about pointers via examples, and show why it is so hard to solve pointer analysis using traditional dataflow approaches. We then move on to explain two different algorithms to solve this kind of problem, paying particular attention on the need to find cycles between constraints. Videos: part 1, part 2, and part 3. Discussion question Q1 and Q2.
Loop Optimizations: in this class we go over some techniques used to speedup loops, notoriously the most critical part of any program. We show how to identify loops, and nested loops, and introduce the notion of dominance between basic blocks. We then go over simple optimizations, such as strength-reduction, loop unrolling and code hoisting. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1 and Q2.
Static Single Assignment Form: this class introduces the most important intermediate program representation used in compilers today: the SSA form. Virtually every relevant compiler that has been implemented in the last 20 years uses this intermediate representation. It is so important because it simplifies considerably the design and implementation of many compiler analyses and transformations, such as dead-code elimination and constant propagation. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
Sparse Abstract Interpretation: in this class we generalize the concept of Static Single Assignment to Static Single Information. Our goal is to show to the student how to convert a program to an intermediate representation that facilitates backward and forward flow analyses. These representations provide the property that the information associated with a variable is invariant along this variable's entire live range. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
Tainted Flow Analysis: this class introduces the students to the notion of tainted flow attack. Tainted flow attacks include a vast range of program attacks, such as SQL injection, cross site scripting and remote command execution. This class lets us show how compilers can be used to secure programs against malicious users. Additionally, it allows us to compare a dense and a sparse approach to static program analysis. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.

Second Part

Slides Exercises Description
Range Analysis: This class describes range analysis, a technique to estimate the lower and upper limits that integer variables may assume during the execution of a program. Range analysis is useful in several ways: it helps to eliminate array bounds checks, to perform dead code elimination and to find security vulnerabilities in programs. We describe a complete algorithm, which uses state-of-the-art techniques to implement the analysis. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
Program Slicing: this time we will talk a bit about program slicing, a technique that supports software engineers in finding bugs, and understanding programs. We shall see what is a program slice, and how to build it. We will close on a few notions, such as data and control dependences, paying particular attention to a fast way to detect the latter. A great deal of time will be spent in proving the correctness of a core algorithm. We will also discuss an SML implementation of this algorithm. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
Predictive Compilation: this class will go over some stochastic techniques used in static analysis and code optimization. We shall cover the concept of predictive compilation, and will show how machine learning can be used to estimate dynamic properties of programs, such as the outcome of branches, or the benefits of optimizations. The class contains examples of classification and regression problems that pertain to the domain of compiler design. Videos: part 1, part 2, part 3, part 4, and part 5, and Conclusion. Discussion questions Q1 and Q2.
Operational Semantics: in this class we introduce the student to the notion of operational semantics. Operation semantics is one of the key tools that the compiler writer has to reason about program behavior. We discuss the notation of inference rules, and show how simple proofs of properties can be mechanically constructed from formal specifications. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
Type Systems: in this class we introduce a different kind of technology used in program analysis: the type systems. We explain the concepts of progress and preservation, which, together, we can use to prove that a programming language is free of some form of runtime errors. We pay special attention on the basic techniques to prove properties about type systems, and how these techniques can be used mechanically. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
Proving Metatheorems with Twelf: this class introduces the students to Twelf, a framework that supports the development of logical systems. We will go over basic techniques to describe the syntax and the semantics of a few basic deductive systems. We will also see some basic proof techniques that one can use in Twelf. In particular, we will mechanize the proof of preservation, seen in the previous class. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
Type Inference: this class explains how to infer types for the simply typed lambda-calculus. We shall explain how to generate and solve constraints; and how to use the solution of this constraint system to rewrite the program, so that the final code type-checks. We shall prove the correctness of our approach, and provide several examples in the ML programming language that illustrate how type inference works. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
Just-In-Time Compilers: Here we introduce the student to the concept of just-in-time compilation. A JIT compiler generates code for a program while this program is being executed. In this class we go over speculation and specialization, two key techniques used to optimize code at runtime. We analyze two speculative techniques, class inference and parameter specialization. We close the class by discussing the notion of a trace compiler. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
Register Allocation: this is likely the most important classic compiler optimization. In this class we will see two different register allocation algorithms: linear scan and iterated register coalescing. We will see some NP-completeness results, and will discuss also coalescing, an optimization that is tightly coupled with register allocation. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
SSA-Based Register Allocation: in this class we go over one of the latest developments in compiler theory: the fact that the interference graph of SSA-form programs are chordal. We will see why this property always holds, and will take a look into a few algorithms that can find the chromatic number of chordal graphs in polynomial time. We will also revisit spilling and coalescing in the context of chordality. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
Correctness: this class combines register allocation and type systems. Although seemingly unrelated, type systems can be used to verify the correctness of the output of the register allocator, in a process that is usually called translation validation. We will go over the definition of a correct register allocator, and will show how progress and preservation are key to prove the core property of semantics preservation. Discussion question Q1.
Divergence Analysis: this class introduces the student to general purpose graphics processing units, and describes a static code analysis that we call Divergence Analysis. Divergence Analysis proves that two variables always have the same values for different threads of a SIMD set of threads. The class then moves on to describe a suite of divergence aware compiler optimizations, with emphasis on Divergence Aware Register Allocation. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.

Extra Material

Introduction to Valgrind: Valgrind is an instrumentation framework that lets us build dynamic analysis tools. Such tools can be used for a variety of purposes, such as detecting memory leaks, concurrency bugs, performance inefficiencies, etc. Several papers use Valgrind to get their results, and this framework has also seen much use in the industry. In this class, Andrei Rimsa will explain how to use Valgrind to implement dynamic analyses of programs.
Instruction Level Parallelism: the goal of this class is to introduce the student to algorithms used to schedule instructions in high-performance architectures which provide long pipelines and the hability to issue multiple instructions each cycle. We will go over basic block scheduling, superblock formation and static profiling. We also analyze a few characteristics of high-performance machines, such as the VLIW design, and predicated instructions.
Automatic Parallelization: in this class we go over a few techniques to transform sequential C-like loops into parallel CUDA-loops. We analyze two main notions: data reuse and data dependences within loop iterations. We then move on to see how we can map points in the loop's iteration space to processor identifiers. Throughout the class we will see a number of actual CUDA kernels, derived from C programs.

LLVM Course

Two of the assignments in the course use LLVM. We have an online course about LLVM here. You will need the first 11 videos in that playlist to better understand how to solve the homeworks.

Última atualização: 12 de Maio de 2021.

Last update: May 12^th, 2021.

Slides	Exercises	Description
		Introduction: a bit of everything. Why to study compilers, the history behind them, how compiler optimizations work, etc, etc, etc. This class also gives a general overview about the entire course. In addition to the exercises, this time we have a few slides from the ACE group, a company that sells compiler technology! This class can be watched on YouTube: Part 1, Part 2, Part 3, and Part 4. Discussion question Q1.
		Control Flow Graphs: in this class we describe the most fundamental program representation used by the compiler: the control flow graph. We define basic blocks, and go over local optimizations: code optimization techniques that work exclusively within the boundaries of single basic blocks. In this context, we talk about the use of directed acyclic graphs to represent basic blocks, about peephole optimizations and about local register allocation. This class can be watched on YouTube: Part 1, Part 2, Part 3, and Part 4. Discussion question Q1 and Q2.
		Introduction to Dataflow Analyses: in this class we review the basic principles that support dataflow analyses. We take a look into the different types of analyses, e.g., may/must and forward/backwards. The class is mostly based on examples, which illustrate these principles and differences. Videos for this class: Part 1, Part 2, Part 3, Part 4, and Part 5. Discussion question Q1 and Q2.
		Algorithms to Solve Dataflow Analyses: in this class we go over different strategies to solve dataflow analyses efficiently. We study the worklist algorithm, which is a baseline design that can deal with general dataflow analyses, and then refine this design, observing how a better ordering of the constraints to be solved can have a positive influence on the performance of the algorithm. Watch the videos about this class: Part 1, Part 2, Part 3, Part 4, and Part 5. Discussion question Q1.
		Introduction to Lattice Theory: in this class we take a look into lattices. Lattices constitute the core of the theory supporting dataflow analyses. We discuss why the worklist algorithm seen previously terminates for many different types of dataflow analyses. We introduce and explain the notion of fixed points, and show why it is so fundamental to static programa analyses. Videos are available on YouTube: Part 1, Part 2, Part 3, and Part 4. Discussion question Q1.
		Partial Redundancy Elimination: in this class we go over this classic compiler optimization, focusing on lazy code motion. Partial Redundancy Elimination (PRE) has the virtue of subsumming many different dataflow analyses, joining in a single algorithm the effects of different compiler optimizations. First video, and second video. Discussion question Q1.
		Constraint Based Analyses: in this class we discuss a different type of static program analyses: the contraint based approach. Analyses that fit into this category can be used to deal with problems that do not assume any ordering between the statements of a program. A typical example of such a problem is the inference of a control flow graph for a program coded in a language with dynamic dispatch. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
		Pointer Analysis: this class presents one of the most important and well-know types of constraint based analyses: the pointer analysis. We illustrate the need for knowing information about pointers via examples, and show why it is so hard to solve pointer analysis using traditional dataflow approaches. We then move on to explain two different algorithms to solve this kind of problem, paying particular attention on the need to find cycles between constraints. Videos: part 1, part 2, and part 3. Discussion question Q1 and Q2.
		Loop Optimizations: in this class we go over some techniques used to speedup loops, notoriously the most critical part of any program. We show how to identify loops, and nested loops, and introduce the notion of dominance between basic blocks. We then go over simple optimizations, such as strength-reduction, loop unrolling and code hoisting. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1 and Q2.
		Static Single Assignment Form: this class introduces the most important intermediate program representation used in compilers today: the SSA form. Virtually every relevant compiler that has been implemented in the last 20 years uses this intermediate representation. It is so important because it simplifies considerably the design and implementation of many compiler analyses and transformations, such as dead-code elimination and constant propagation. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		Sparse Abstract Interpretation: in this class we generalize the concept of Static Single Assignment to Static Single Information. Our goal is to show to the student how to convert a program to an intermediate representation that facilitates backward and forward flow analyses. These representations provide the property that the information associated with a variable is invariant along this variable's entire live range. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		Tainted Flow Analysis: this class introduces the students to the notion of tainted flow attack. Tainted flow attacks include a vast range of program attacks, such as SQL injection, cross site scripting and remote command execution. This class lets us show how compilers can be used to secure programs against malicious users. Additionally, it allows us to compare a dense and a sparse approach to static program analysis. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.

Slides	Exercises	Description
		Range Analysis: This class describes range analysis, a technique to estimate the lower and upper limits that integer variables may assume during the execution of a program. Range analysis is useful in several ways: it helps to eliminate array bounds checks, to perform dead code elimination and to find security vulnerabilities in programs. We describe a complete algorithm, which uses state-of-the-art techniques to implement the analysis. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
		Program Slicing: this time we will talk a bit about program slicing, a technique that supports software engineers in finding bugs, and understanding programs. We shall see what is a program slice, and how to build it. We will close on a few notions, such as data and control dependences, paying particular attention to a fast way to detect the latter. A great deal of time will be spent in proving the correctness of a core algorithm. We will also discuss an SML implementation of this algorithm. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
		Predictive Compilation: this class will go over some stochastic techniques used in static analysis and code optimization. We shall cover the concept of predictive compilation, and will show how machine learning can be used to estimate dynamic properties of programs, such as the outcome of branches, or the benefits of optimizations. The class contains examples of classification and regression problems that pertain to the domain of compiler design. Videos: part 1, part 2, part 3, part 4, and part 5, and Conclusion. Discussion questions Q1 and Q2.
		Operational Semantics: in this class we introduce the student to the notion of operational semantics. Operation semantics is one of the key tools that the compiler writer has to reason about program behavior. We discuss the notation of inference rules, and show how simple proofs of properties can be mechanically constructed from formal specifications. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		Type Systems: in this class we introduce a different kind of technology used in program analysis: the type systems. We explain the concepts of progress and preservation, which, together, we can use to prove that a programming language is free of some form of runtime errors. We pay special attention on the basic techniques to prove properties about type systems, and how these techniques can be used mechanically. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		Proving Metatheorems with Twelf: this class introduces the students to Twelf, a framework that supports the development of logical systems. We will go over basic techniques to describe the syntax and the semantics of a few basic deductive systems. We will also see some basic proof techniques that one can use in Twelf. In particular, we will mechanize the proof of preservation, seen in the previous class. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		Type Inference: this class explains how to infer types for the simply typed lambda-calculus. We shall explain how to generate and solve constraints; and how to use the solution of this constraint system to rewrite the program, so that the final code type-checks. We shall prove the correctness of our approach, and provide several examples in the ML programming language that illustrate how type inference works. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		Just-In-Time Compilers: Here we introduce the student to the concept of just-in-time compilation. A JIT compiler generates code for a program while this program is being executed. In this class we go over speculation and specialization, two key techniques used to optimize code at runtime. We analyze two speculative techniques, class inference and parameter specialization. We close the class by discussing the notion of a trace compiler. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		Register Allocation: this is likely the most important classic compiler optimization. In this class we will see two different register allocation algorithms: linear scan and iterated register coalescing. We will see some NP-completeness results, and will discuss also coalescing, an optimization that is tightly coupled with register allocation. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.
		SSA-Based Register Allocation: in this class we go over one of the latest developments in compiler theory: the fact that the interference graph of SSA-form programs are chordal. We will see why this property always holds, and will take a look into a few algorithms that can find the chromatic number of chordal graphs in polynomial time. We will also revisit spilling and coalescing in the context of chordality. Videos: part 1, part 2, part 3, part 4, and part 5. Discussion question Q1.
		Correctness: this class combines register allocation and type systems. Although seemingly unrelated, type systems can be used to verify the correctness of the output of the register allocator, in a process that is usually called translation validation. We will go over the definition of a correct register allocator, and will show how progress and preservation are key to prove the core property of semantics preservation. Discussion question Q1.
		Divergence Analysis: this class introduces the student to general purpose graphics processing units, and describes a static code analysis that we call Divergence Analysis. Divergence Analysis proves that two variables always have the same values for different threads of a SIMD set of threads. The class then moves on to describe a suite of divergence aware compiler optimizations, with emphasis on Divergence Aware Register Allocation. Videos: part 1, part 2, part 3, and part 4. Discussion question Q1.

		Introduction to Valgrind: Valgrind is an instrumentation framework that lets us build dynamic analysis tools. Such tools can be used for a variety of purposes, such as detecting memory leaks, concurrency bugs, performance inefficiencies, etc. Several papers use Valgrind to get their results, and this framework has also seen much use in the industry. In this class, Andrei Rimsa will explain how to use Valgrind to implement dynamic analyses of programs.
		Instruction Level Parallelism: the goal of this class is to introduce the student to algorithms used to schedule instructions in high-performance architectures which provide long pipelines and the hability to issue multiple instructions each cycle. We will go over basic block scheduling, superblock formation and static profiling. We also analyze a few characteristics of high-performance machines, such as the VLIW design, and predicated instructions.
		Automatic Parallelization: in this class we go over a few techniques to transform sequential C-like loops into parallel CUDA-loops. We analyze two main notions: data reuse and data dependences within loop iterations. We then move on to see how we can map points in the loop's iteration space to processor identifiers. Throughout the class we will see a number of actual CUDA kernels, derived from C programs.