As per the official site:

LLVM is a collection of the modular and reusable compiler and toolchain technologies. “LLVM” is not an acronym; it is the full name of the project.

Let’s take the below diagram,

Human readable to Machine readable

We write source code, in the language of our choice. This code is human-readable, but the machine on which it runs does not understand it. That’s why we compile, which converts the source code and translates it into something which the machine understands.

Compilers Approach

There are various phases involved when we compile our source program, (source, scanner, parsing, etc.). Traditional compilers had all these steps as a monolithic structure.

Modern Compilers

What modern compilers do, it delegates the compiling tasks to a set of different components (shown in the above diagram).

What does this mean?

Different teams can work on the tasks they specialize in. For instance, a team good at parsing can focus only on it and make it better.

Enter LLVM

Chris Lattener
Chris Lattner

Chris Lattner is the main author of LLVM and related projects such as the Clang compiler and the Swift programming language. Resume here

LLVM is famous for Intermediate Representation(LLVM IR) and IR Optimizer. During a program compilation, the source language is converted into this IR, and the output is finally run on a target architecture (x86, ARM, etc.)

Compilers Overview
Compilers Overview
  • Front-end: compiles source language to IR.
  • Middle-end: optimizes IR.
  • Back-end: compiles IR to machine code.

What‘s LLVM IR?


  • a platform-independent assembly language.
  • strongly typed

(e.g., i32 a 32-bit integer, i32** a pointer to pointer to 32-bit integer)

  • has unlimited SSA(Static Single Assignment) Register machine instruction set.

SSA: Every variable is assigned exactly once, named with a % character. 

  • a low-level RISC-like virtual instruction set.

LLVM IR has 3 common representations

  • Human-readable LLVM assembly (.ll files)
  • Bitcode binary representation (.bc files)
  • C++ classes

In particular, LLVM IR is both well-specified and the only interface to the optimizer. This property means that all you need to know to write a front end for LLVM is what LLVM IR is, how it works, and the invariants it expects.

Programming Example

Let’s look at a simple C code,

C Code
C Code

If we compile this code into LLVM using below 

clang -S -emit-llvm -O3 hello_world.c

or online (see links below), we get

define i32 @main() #0 {
  ret i32 7              ; Return an integer value of 7

A sequence of instructions that execute in order is a basic block. Basic blocks must end with a terminator.

Note: You can’t jump into the middle of a basic block

There are some terminator instructions defined in LLVM:

Global identifiers (functions, global variables) begin with the '@' character

Guide to LLVM Language

Links for programming online:

Godbolt, ELCC

Phi φ

There are some times when you want to select the values executed from the previous basic block.

<result> = phi <ty> [ <val0>, <label0>], ...

Using phi:

Loop:       ; Infinite loop that counts from 0 on up...
%indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ]
%nextindvar = add i32 %indvar, 1
br label %Loop

This basically says take the value 0, if we are coming from basic block LoopHeader, or take the value of nextindvar if we are coming from Loop block.

At runtime, the ‘phi’ instruction logically takes on the value from the block that executed just prior to the current block.

LLVM IR Optimization

Most optimizations follow a simple three-part structure:

  • Look for a pattern to be transformed.
  • Verify that the transformation is safe/correct.
  • Do the transformation, updating the code.

The most trivial optimization is pattern matching on arithmetic identities, for any integer X,

  • X-X is 0
  • X-0 is X
  • (X*2)-X is X
⋮ ⋮ ⋮
%example1 = sub i32 %a, %a
⋮ ⋮ ⋮
%example2 = sub i32 %b, 0
⋮ ⋮ ⋮
%tmp = mul i32 %c, 2
%example3 = sub i32 %tmp, %c
⋮ ⋮ ⋮
// X - 0 -> X
if (match(Op1, m_Zero()))
return Op0;

// X - X -> 0
if (Op0 == Op1)
return Constant::getNullValue(Op0->getType());

// (X*2) - X -> X
if (match(Op0, m_Mul(m_Specific(Op1), m_ConstantInt<2>())))
return Op1;

LLVM Language Reference here.

LLVM in different languages

A programming language is incomplete without a compiler. Many languages use LLVM to produce their compilers since LLVM removes a lot of additional tasks.

  • Emscripten project takes LLVM IR code and converts it into JavaScript, allowing any language with an LLVM back end to export code that can run on the browser.
  • Nvidia used LLVM for the Nvidia CUDA Compiler, which lets languages add native support for CUDA.

Many languages and language runtimes have LLVM support, including C#/.NET/Mono, Rust, Haskell, OCAML, Node.js, Go, and Python.

Future of LLVM — MLIR

As we know, not every language is supported by LLVM, but the increasing use of LLVM is attracting new languages to use it. 

Introducing MLIR (Multi-Level Intermediate Representation)

MLIR is a flexible infrastructure for modern optimizing compilers. This means it consists of a specification for intermediate representations (IR) and a toolkit to perform transformations on that representation. 

MLIR is highly influenced by LLVM and reuses many great ideas from it. It also supports hardware-specific operations. 

Slides here.

Valuable comments