DNA as a general information readout platform
Target
Converting to DNA
DNA
--
RNA
Reverse transcription
Protein
Proximity ligation (PLA)
Ecology
28S ribosomal RNA->cDNA
Chromosomal aberration
Shotgun sequencing of
maternal blood
Counterfeit products
Tracer DNA
Small molecule binding
Tag with DNA
Sten Linnarsson, MBB, Mol Neuro
Before DNA
15 mars 2012
2
15 mars 2012
4
What is the physical basis of the gene?
The ”aperiodic crystal” (Schrödinger 1944)
 Evolution occurs by natural selection operating on
inherited but variable characters
 Charles Darwin (1858)
 Gregor Mendel (1865)
 Cells consist of lipids, sugars, proteins and nucleic acids
 Friedrich Miescher discovered nucleic acids in 1869
Erwin Schrödinger
 Chromosomes carry the genetic information
 Weisman (germ plasm theory)
 Walter Sutton, Theodor Boveri 1902
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
3
Sten Linnarsson, MBB, Mol Neuro
The transforming principle (Griffith 1928)
Genes have physical location on the chromosome
Thomas Hunt Morgan (1916)
Frederick Griffith (with Bobby)
Thomas Hunt Morgan
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
5
Namn Efternamn
15 mars 2012
6
The gene is made of DNA
(Avery, McLeod, McCarty 1944)
Transformation in vitro
The Waring Blender Experiment
(Hershey & Chase 1952)
Isolate the ”transforming principle”
and determine its character
Martha Chase & Alfred Hershey
Oswald Avery (Christmas, with egg-nog)
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
7
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
8
15 mars 2012
10
15 mars 2012
12
The double helix
(Watson & Crick 1953)
Chargaff’s rules (1952)
Francis Crick and James Watson
Erwin Chargaff
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
9
DNA is a triplet code for proteins
(Khorana, Nirenberg, Holley; also Crick & Brenner)
Sten Linnarsson, MBB, Mol Neuro
RNA sequencing of bacteriophage MS2
Robert Holley, Marshall Nirenberg, Har Gobind Khorana
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
11
Sten Linnarsson, MBB, Mol Neuro
Maxam-Gilbert Sequencing
DNA Sequencing (early days)
 ”Wandering spot analysis” (Maxam & Gilbert 1973)
 Plus/minus method (Sanger 1975)
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
13
Sanger (dideoxy) sequencing (1977)
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
14
15 mars 2012
16
15 mars 2012
18
Scaling it up (2nd generation)





Hood/Smith 1986: fluorescent sequencing
Kary Mullis 1986: PCR
Human genome project proposed (1986)
1990: Lloyd Smith capillary sequencer
Hideki Kambara sheath-flow cell
Leroy Hood
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
15
Sten Linnarsson, MBB, Mol Neuro
Strategies
How to sequence longer pieces of DNA
 Money
 Hierarchical shotgun
 Whole-genome shotgun
Colony picker
Liquid handler
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
17
Sten Linnarsson, MBB, Mol Neuro
Craig Venter
The mighty Perkin-Elmer 3700
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
19
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
20
The end of the beginning
Next-generation sequencing
 Sanger sequencing is based on physical separation of subfragments
 ’Next-generation sequencing’:
 Clonal display of DNA fragments
 Stepwise sequencing with optical detection
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
21
HiSeq sample prep
Sten Linnarsson, MBB, Mol Neuro
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
22
15 mars 2012
24
HiSeq cluster generation
15 mars 2012
23
Sten Linnarsson, MBB, Mol Neuro
HiSeq sequencing
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
25
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
26
15 mars 2012
28
15 mars 2012
30
Reversible terminators (Illumina)
Sample input requirements
Concentration (nM)
Volume (µL)
Fragment length (bp)



Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
27
HiSeq
FLX
2-10
1
SOLiD
1
2
1.2
5-10
100-600
200-600
150-200
2-3 billion 300 bp molecules
~1-10 nanograms
~700 diploid human genomes
Sten Linnarsson, MBB, Mol Neuro
454 Genome Sequencer
Illumina HiSeq 2000
(first of the ’next’ generation)
(current leader)
Feature
Specification
Feature
Specification (per flowcell*)
Read length
400 – 600 bp
Read length
50 - 100 bp (selectable)
Reads/run
1 million
Reads/run
1.5 billion
Paired-end
YES (long)
Sequence yield per run
300 Gbp
Run time
10 hours
Paired-end
YES (short and long)
Raw error rate
<1%
Run time
Reagent cost
35,000 SEK/run
2 days (50 bp)
11 days (2x100 bp)
Raw error rate
<1%
Reagent cost
35,000 SEK/run (1x50 bp)
(*) Can run two flowcells in parallel
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
29
Sten Linnarsson, MBB, Mol Neuro
Impact of the ’next generation’
Current sequencing instruments
HiSeq
MiSeq
SOLiD
IonTorrent
454 GS
PacBio RS
Dec 2012: $1000 genome
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
32
15 mars 2012
34
15 mars 2012
36
31
RNA-Seq
Applications




Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
33
Isolate RNA
Convert to cDNA
Fragment and ligate adapters
Sequence
Sten Linnarsson, MBB, Mol Neuro
RNA from single cells can be sequenced
RNA-Seq is already superior to microarrays
Cost ~2500 SEK/sample
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
35
Sten Linnarsson, MBB, Mol Neuro
ChIP-Seq
ChIP-Seq
 ChIP-Seq reveals sites where
proteins bind chromatin
 Not same as transcriptional
activation/repression!
Elaine Mardis
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
37
Mikkelsen et al.
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
38
15 mars 2012
40
15 mars 2012
42
Human gut microflora
Metagenomics
 Shotgun sequencing of microbial ecosystems
 Soil, water, air
 Human microflora (saliva, lung, intestine, skin, …)
 Direct discovery of microbes without preconceptions
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
39
Genome sequencing
Human skin microbiome
Sten Linnarsson, MBB, Mol Neuro
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
41
Sten Linnarsson, MBB, Mol Neuro
Glossary of terms









Read
Read length
Mate-pair
Paired-end read
Assembly
Contig
Scaffold
Quality score
Error rate





Coverage
N50 length
Sequencing depth
Metagenomics
Colorspace
 Raw error
 Consensus error
Sten Linnarsson, MBB, Mol Neuro
15 mars 2012
43