How Illumina Sequencing Works

Illumina sequencing is the most widely used DNA sequencing platform today. It powers everything from cancer genomics to microbiome research, including antimicrobial resistance (AMR) surveillance.

Overview of the Workflow

The entire Illumina sequencing process is divided into four major stages:

 

Stage

Name

Purpose

1

Library Preparation

Fragment DNA and attach adapter sequences to each fragment end

2

Cluster Generation

Amplify each fragment into ~1,000 identical copies on the flow cell

3

Sequencing by Synthesis

Add fluorescent nucleotides one at a time and image each cycle

4

Data Analysis

Base-call reads and align to a reference genome or assemble de novo

 

Stage 1: Library Preparation

We start with a biological sample like tissue, blood, a swab, or environmental material, and extract genomic DNA. This raw DNA is far too long to sequence directly, so the first step is fragmentation: breaking it into short pieces, typically 150–300 base pairs long.

Fragmentation is performed either by sonication (high-frequency sound waves) or enzymatic digestion. The ends of the resulting fragments are then repaired, and short synthetic sequences called adapters are ligated to both ends of each fragment.

These adapters serve three critical purposes:

Anchoring: attaching the fragment to the flow cell surface

Amplification primer site: allowing bridge amplification to occur

Sequencing primer site: providing the starting point for the sequencing reaction



The full collection of adapter-tagged fragments is called a sequencing library.

 

Figure 1. Library preparation: genomic DNA is fragmented, and adapter sequences (A) are ligated to both ends of each fragment, creating the sequencing library.

Stage 2: Cluster Generation (Bridge Amplification)

The flow cell is a small glass chip whose surface is densely coated with short oligonucleotide sequences complementary to the adapters. When the library is loaded onto the flow cell, each DNA fragment randomly hybridizes to one of these surface oligos via its adapter end.

Bridge amplification then begins: the fragment bends over, and its other adapter end attaches to a nearby oligo, forming a bridge shape. DNA polymerase copies the bridged fragment, resulting in two identical copies. Each copy bends and attaches to nearby oligos, is copied again, and the process repeats approximately 35 times until approximately 1,000 identical copies occupy a single tight spot. This spot is called a cluster.

 

Why make ~1,000 copies?

A single DNA molecule emitting a fluorescent signal is far too dim for a camera to reliably detect. The ~1,000 copies in a cluster all glow the same color simultaneously, creating a bright enough signal. Think of it as 1,000 people all shouting the same letter at once, the message becomes audible. Crucially, the copies are NOT read individually; the entire cluster is read as one combined signal, producing one base call per cycle.

 

Figure 2. Bridge amplification: each bound fragment bends and is copied repeatedly until ~1,000 identical copies form a single cluster on the flow cell surface.

Stage 3: Sequencing by Synthesis (SBS)

This is the core innovation of Illumina technology. Instead of reading pre-existing DNA, you watch new DNA being built and record each nucleotide as it is incorporated.

The Modified Nucleotides

Illumina uses specially modified nucleotides. Each of the four bases (A, T, G, C) carries two key modifications:

A unique fluorescent dye: each base emits a different color when excited by a laser

A reversible terminator: a chemical blocker that prevents more than one nucleotide from being incorporated per cycle

One Sequencing Cycle (Three Steps)

  1. Incorporate: All four modified nucleotides are washed over the flow cell. DNA polymerase adds the correct complementary base to each copy in every cluster. The reversible terminator ensures exactly one base is added, then the polymerase stops.

  2. Image: A laser excites the fluorescent tags. The camera photographs the entire flow cell; each cluster glows a specific color corresponding to the incorporated base. This color is recorded as the base call for that cluster in that cycle.

  3. Cleave: A chemical wash removes the fluorescent dye and the terminator block from all clusters. DNA polymerase is now free to proceed. The next cycle begins.

 

Figure 3. One cycle of SBS: (1) polymerase incorporates one fluorescently labeled, terminator-blocked nucleotide per cluster; (2) laser imaging records each cluster's color = base identity; (3) chemical cleavage removes the dye and terminator, enabling the next cycle.

 

The Math: Cycles to Reads

Each cluster produces one base call per cycle. After 150 cycles, you have 150 bases from that fragment; one read. With millions of clusters on the flow cell being read simultaneously, a single Illumina run generates hundreds of millions to billions of reads in parallel.

 

Cycle

Color detected

Base recorded

1

Blue

A

2

Red

T

3

Green

G

4

Blue

A

5

Yellow

C

...

...

...

150

---

Read complete: 150 bp

 

Stage 4:Data Analysis

The raw output from the sequencer is millions of reads, strings of A, T, G, C. The instrument software first performs base calling, converting fluorescent signal intensities into sequence data. Each base is assigned a quality score (Phred score, Q) reflecting the probability of an error.

These reads are then processed through bioinformatics pipelines for:

  • Reference alignment: Reads are mapped to a known reference genome using tools such as BWA or Bowtie2

  • De novo assembly: If no reference genome exists, reads are assembled into contigs using assemblers such as SPAdes

  • Downstream analysis: Variant calling, resistome profiling, taxonomic classification, and more, depending on the experimental goal

Key Specifications

Parameter

Typical Value

Accuracy per base

~99.9% (Q30)

Read length

75–300 bp (paired-end)

Throughput

Up to ~6 Tb per run (NovaSeq X)

Cost per Gb

~$5–10 USD

Run time

11 hours – ~2 days (platform dependent)

 

 

References

1. Bentley, D.R. et al. (2008). Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 456, 53–59. https://doi.org/10.1038/nature07517

2. Goodwin, S., McPherson, J.D., & McCombie, W.R. (2016). Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics, 17, 333–351. https://doi.org/10.1038/nrg.2016.49

3. Mardis, E.R. (2008). Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 9, 387–402. https://doi.org/10.1146/annurev.genom.9.081307.164359

4. van Dijk, E.L. et al. (2014). Ten years of next-generation sequencing technology. Trends in Genetics, 30(9), 418–426. https://doi.org/10.1016/j.tig.2014.07.001

5. Illumina Inc. (2023). Sequencing by Synthesis (SBS) Technology. Retrieved from https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html

6. Alberts, B. et al. (2002). Molecular Biology of the Cell (4th ed.). New York: Garland Science.

Comments

Popular posts from this blog

Influenza Virus Evolution: Challenges of Antigenic Drift and Shift in Vaccine Design and Response