DNA as a general information readout platform Target Converting to DNA DNA -- RNA Reverse transcription Protein Proximity ligation (PLA) Ecology 28S ribosomal RNA->cDNA Chromosomal aberration Shotgun sequencing of maternal blood Counterfeit products Tracer DNA Small molecule binding Tag with DNA Sten Linnarsson, MBB, Mol Neuro Before DNA 15 mars 2012 2 15 mars 2012 4 What is the physical basis of the gene? The ”aperiodic crystal” (Schrödinger 1944) Evolution occurs by natural selection operating on inherited but variable characters Charles Darwin (1858) Gregor Mendel (1865) Cells consist of lipids, sugars, proteins and nucleic acids Friedrich Miescher discovered nucleic acids in 1869 Erwin Schrödinger Chromosomes carry the genetic information Weisman (germ plasm theory) Walter Sutton, Theodor Boveri 1902 Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 3 Sten Linnarsson, MBB, Mol Neuro The transforming principle (Griffith 1928) Genes have physical location on the chromosome Thomas Hunt Morgan (1916) Frederick Griffith (with Bobby) Thomas Hunt Morgan Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 5 Namn Efternamn 15 mars 2012 6 The gene is made of DNA (Avery, McLeod, McCarty 1944) Transformation in vitro The Waring Blender Experiment (Hershey & Chase 1952) Isolate the ”transforming principle” and determine its character Martha Chase & Alfred Hershey Oswald Avery (Christmas, with egg-nog) Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 7 Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 8 15 mars 2012 10 15 mars 2012 12 The double helix (Watson & Crick 1953) Chargaff’s rules (1952) Francis Crick and James Watson Erwin Chargaff Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 9 DNA is a triplet code for proteins (Khorana, Nirenberg, Holley; also Crick & Brenner) Sten Linnarsson, MBB, Mol Neuro RNA sequencing of bacteriophage MS2 Robert Holley, Marshall Nirenberg, Har Gobind Khorana Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 11 Sten Linnarsson, MBB, Mol Neuro Maxam-Gilbert Sequencing DNA Sequencing (early days) ”Wandering spot analysis” (Maxam & Gilbert 1973) Plus/minus method (Sanger 1975) Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 13 Sanger (dideoxy) sequencing (1977) Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 14 15 mars 2012 16 15 mars 2012 18 Scaling it up (2nd generation) Hood/Smith 1986: fluorescent sequencing Kary Mullis 1986: PCR Human genome project proposed (1986) 1990: Lloyd Smith capillary sequencer Hideki Kambara sheath-flow cell Leroy Hood Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 15 Sten Linnarsson, MBB, Mol Neuro Strategies How to sequence longer pieces of DNA Money Hierarchical shotgun Whole-genome shotgun Colony picker Liquid handler Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 17 Sten Linnarsson, MBB, Mol Neuro Craig Venter The mighty Perkin-Elmer 3700 Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 19 Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 20 The end of the beginning Next-generation sequencing Sanger sequencing is based on physical separation of subfragments ’Next-generation sequencing’: Clonal display of DNA fragments Stepwise sequencing with optical detection Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 21 HiSeq sample prep Sten Linnarsson, MBB, Mol Neuro Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 22 15 mars 2012 24 HiSeq cluster generation 15 mars 2012 23 Sten Linnarsson, MBB, Mol Neuro HiSeq sequencing Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 25 Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 26 15 mars 2012 28 15 mars 2012 30 Reversible terminators (Illumina) Sample input requirements Concentration (nM) Volume (µL) Fragment length (bp) Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 27 HiSeq FLX 2-10 1 SOLiD 1 2 1.2 5-10 100-600 200-600 150-200 2-3 billion 300 bp molecules ~1-10 nanograms ~700 diploid human genomes Sten Linnarsson, MBB, Mol Neuro 454 Genome Sequencer Illumina HiSeq 2000 (first of the ’next’ generation) (current leader) Feature Specification Feature Specification (per flowcell*) Read length 400 – 600 bp Read length 50 - 100 bp (selectable) Reads/run 1 million Reads/run 1.5 billion Paired-end YES (long) Sequence yield per run 300 Gbp Run time 10 hours Paired-end YES (short and long) Raw error rate <1% Run time Reagent cost 35,000 SEK/run 2 days (50 bp) 11 days (2x100 bp) Raw error rate <1% Reagent cost 35,000 SEK/run (1x50 bp) (*) Can run two flowcells in parallel Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 29 Sten Linnarsson, MBB, Mol Neuro Impact of the ’next generation’ Current sequencing instruments HiSeq MiSeq SOLiD IonTorrent 454 GS PacBio RS Dec 2012: $1000 genome Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 32 15 mars 2012 34 15 mars 2012 36 31 RNA-Seq Applications Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 33 Isolate RNA Convert to cDNA Fragment and ligate adapters Sequence Sten Linnarsson, MBB, Mol Neuro RNA from single cells can be sequenced RNA-Seq is already superior to microarrays Cost ~2500 SEK/sample Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 35 Sten Linnarsson, MBB, Mol Neuro ChIP-Seq ChIP-Seq ChIP-Seq reveals sites where proteins bind chromatin Not same as transcriptional activation/repression! Elaine Mardis Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 37 Mikkelsen et al. Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 38 15 mars 2012 40 15 mars 2012 42 Human gut microflora Metagenomics Shotgun sequencing of microbial ecosystems Soil, water, air Human microflora (saliva, lung, intestine, skin, …) Direct discovery of microbes without preconceptions Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 39 Genome sequencing Human skin microbiome Sten Linnarsson, MBB, Mol Neuro Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 41 Sten Linnarsson, MBB, Mol Neuro Glossary of terms Read Read length Mate-pair Paired-end read Assembly Contig Scaffold Quality score Error rate Coverage N50 length Sequencing depth Metagenomics Colorspace Raw error Consensus error Sten Linnarsson, MBB, Mol Neuro 15 mars 2012 43