1 Why advanced level structural biochemistry? • Proteins are where the action is (A.Lesk). • Understanding molecular function at the molecular level • Large number of detailed structures known; possible to extract structural principles • Enormous amount of sequences. Complete genomes for 100 s of organisms. • Applied biochemistry • Structure-based drug design in pharmaceutical industry • Protein engineering: Using molecular biology to create modified proteins (catalysis, stability…) • Determination of protein structure is complicated and done by specialists. • Using structural information is not complicated. Required knowledge for most practising biochemists • Need of skills in using tools for visualization and analysis of structures. Molecular graphics 2 Structural biochemistry in this course Lectures: Petsko & Ringe • Principles of protein structure; structural classes • Methods for structure determination (introductory) • Protein stability and folding • Structure and function in cell signalling • Introduction to bioinformatics • Introduction to proteomics • Modelling of structures (starting from amino acid sequence) Computer exercises • Molecular graphics: Using a basic software (Swiss-pdbviewer [DeepView]) for analysis and visualization. • Assignment: Explore and visualize protein-ligand interactions with Swiss-pdbviewer Page 1 Protein structures 3 -Growth of the Protein Data Bank: number of 3D protein structures 100000 70000 Total number of structures 50000 April 2008: ca 50 000 structures, 42 000 proteins, 1000 DNA 10000 Total number of structures 60000 1000 40000 30000 20000 10000 100 10 1 0 Year 4 Sequences • DNA sequencing Contents of TrEMBL (deduced protein sequences from EMBLnucleotide database): Identification of coding regions, conceptual translation. • Protein Sequencing Labourious. Modern methods (Edman degradation, mass spectrometry) require very little protein. SDS-PAGE sufficient. Short sequences, useful for matching protein against gene database • Posttranslational modifications can only be detected at the protein level Page 2 5 Annotation Description of the content of a database entry Manual analysis Computerized analysis Possible coding regions: DNA sequence ORF (open reading frame; exons/introns) EMBL, Genbank, DDBJ Protein sequence TrEMBL Swissprot Similar sequences; known function? Is the gene expressed? (Experiment needed) 6 Genome sequencing and protein structure • Identification of genes in complete genomes, many with unknown function • Functional genomics , Proteomics ; identify all proteins i a given cell at a given occasion. • Structure determination for identifiction of function; largescale projects for structure determination. Structural genomics • Association of genes and function? Page 3 7 Example: new journal 8 Proteins: biological function • Enzymes Catalysis of biochemical reactions • Transport proteins Membrane transport Carrier proteins • Signal transduction, between cells and within cells • Structural proteins Cytoskeleton Extracellular structure proteins Page 4 9 Function in the E. coli genome! 10 Small scale example! • Minimal genome project (Venter m. fl.); Mycoplasma genitalium (one of the first genomes sequenced) • 517 genes, coding for 480 proteins • Of these about 265-350 necessary for growth (lab). From large scale mutagenesis • Of these, about 100 with unknown function Map of M. genitalium chromosome White: Hypothetical proteinsconserved Gray: Unclassified + unknown function Black: Hypothetical proteins Page 5 11 poly-L-α-amino acids Short polymers (< 40 aminosyror): di, tri, ...oligopeptides. (α)-Amino acids • Asymmetric carbon; två Proteins, > 40 aminosyror, well-defined 3D structure stereoisomerers. In proteins only Laminosyror. [(S)-in the O CIP-convention] H R1 H RN H N • N-terminal O + look along the H-CH3N N Rn H H bond: O O C-terminal + R H O H3 N O R CO N N-2 • Peptide bonds Clockwise: CO-R-N: CORN O R1 H + O H3 N N H Rn H O • 20 different amino acids in protein synthesis. Others are found; modified after synthesis (posttranslational modification) H2 O R1 H N H O H N O R1 H 12 Levels of protein structure • Primary structure The order of amino acids, disulphides, covalent modifications • Secondary structure Structure of the peptide backbone (not side chains). Sometimes regular • Supersecondary structure/motifs • Domains/folds • Tertiary structure 3D structure , includes all atoms • Quarternary structure Arrangement of subunits in proteins with several peptide chains Cartoon (simplified representation) of tertiary structure Petso&Ringe fig. 1.2 Page 6 13 Distribution of chain lengths (statistics from the Swiss-Prot database) 14 The 3 D structure is specified by the primary structure The central dogma • Denatured (unfolded) proteins can refold if denaturing conditions are relieved. • Some in vivo protein folding requires catalysis by chaperones, enzymes that facilitate folding Replication DNA Transcription RNA • If there is modification after initial folding, denatured proteins may not refold Translation Polypeptide chain Folding protein Page 7 15 L A G S V E T K I D P R Q N F Y H M C W Alifaphatic I: alanine and valine O Alanin Ala; A H C N CH3 H O Valin Val; V H C CH3 CH N H CH3 0 5 10 Frekvens (%) 16 Aliphatic II: leucine och isoleucine O O Leucin Leu;L H C Isoleucin H3 C CH Ile; I H C N CH2 H CH CH3 CH 3 CH2 CH3 Page 8 N H L A G S V E T K I D P R Q N F Y H M C W 0 5 Frekvens 10 (%) 17 Hydroxyl group: serine, treonine O O Serin Ser; S H C CH2 L A G S V E T K I D P R Q N F Y H M C W Treonin Thr, T N H OH H C N CH H CH3 OH 0 5 Frekvens 10 (%) 18 Containing sulphur: cysteine (polar) och methionine (nonpolar) O Cystein Cys, C O H C N CH2 H Metionin Met, M H H N L A G S V E T K I D P R Q N F Y H M C W 0 SH CH3 Page 9 5 Frekvens S 10 (%) 19 Polar och neutral: the amides aspargine and glutamine O Apspargin Asn, N H O C H O C N H Glutamin Gln, Q N H O NH2 NH2 L A G S V E T K I D P R Q N F Y H M C W 0 5 Frekvens 10 (%) 20 Acidic: aspartic acid och glutamic acidO O Asparginsyra (Aspartat) Asp, D H O C N H Glutaminsyra (Glutamat) Glu, E H C N H L A G S V E T K I D P R Q N F Y H M C W 0 O O Page 10 5 Frekvens O 10 (%) 21 Basic: lysine and arginine O H O H C Lysin Lys; K C Arginin Arg, R N H NH N H L A G S V E T K I D P R Q N F Y H M C W 0 H2N NH2 + 5 Frekvens 10 (%) + NH4 22 Aromatic: phenylalanine and tyrosine O Fenylalanin Phe, F O H C Tyrosin Tyr, Y H C N H OH N H L A G S V E T K I D P R Q N F Y H M C W 0 5 Frekvens Page 11 10 (%) 23 Aromatic/heterocyclic: histidine and tryptophan O Histidin His, H H C Tryptofan Trp, W L A G S V E T K I D P R Q N F Y H M C W O N H H C N N H N H 0 N 5 Frekvens 10 (%) 24 Steric properties : glycine and proline O O Glycin gly; G Prolin Pro; P H C N H H H C N L A G S V E T K I D P R Q N F Y H M C W 0 5 Frekvens Page 12 10 (%) 25 Abundance Statistics from the Swiss-Prot database 26 Posttranslational modification, examples O Phosphorylation (Ser, Thr, Tyr) OH COO- O P Phosphoserine, -treonine, -tyrosine O O - COO γ-carboxyglutamate - COO O Hydroxylation Hydroxyproline N OH Discovered recently: Tryptophyl tryptophandione TTQ O Tyr NH O 2 Trp HO topaquinone, TPQ N H O O Page 13 Fler exempel i Creighton 27 Summary of important properties • Hydrophobicity • Size of the side chain • Packning density in proten interiors is comparable to organic crystals • The hydrophobic effect is contributes to the stablity of the 3-D structure. Hydrophobic sidechains are found mainly in theinterior of proteins and hydrophilic sidechains are found mainly on the outside • Charge and polarity • Steric properties • Electrostatic interactions • Hydrogen bonding • Glycine: minimum of steric hindrance • Proton transfer (eg acid-base catalysis) • Proline: less conformational freedom • Also less conformational freedom in branched aliphatics (Val, Ile). • Cysteine; disulphide bonds 28 Size (molecular volume) 250 Volym,Å^(3) 200 150 100 50 0 G A S C D P N T E V Q H M I Page 14 L K R F Y W 29 Hydrophobicity 4 Hydrophobic 2 0 GES-hydropaty The GoldmanEngelman-Steitz (GES) scale for hydrophobicity (hydopathy) is based ont the partitioning of the amino acid between aqueous and organic solvent -2 -4 -6 Hydrohilic -8 -10 -12 -14 R D K E N Q H Y P S G T A W C V L I M F Data från Brändén och Tooze, p. 210 30 Volume and hydrophobicity Volume Lesk: Introduction to protein science Page 15 31 Protonation of side chains O O O OH O O Asp, Glu 4.5; 4.6 - HN His pKa + NH HN + HN NH Cys N SH 6.2 9.1-9.5 S Tyr pKa of the side chains in proteins values depend on the local surroundings and vary 9.7 O OH Lys Arg 10.4 NH3 NH2 N H + NH2 NH2 N H NH2 + NH2 N H + ca 12 NH2 NH 32 Hydrogen bonding Ser, Thr; sp3-O Asp, Glu; sp2-O Asn, Gln; sp2-O; sp2-N O O O O O O H N H H sp2-N His H N H N Trp H N N + H sp3-N Arg N H Lys N + H N N H H Page 16 H H+ H N H 33 Cysteine: disulphide bonds H N H O HS H SH H H H NH H Oxidation O Reduktion H N H O S H H S H H NH H Cystin, stabil endast i oxiderande miljö. Förekommer i extracellulära proteiner O 34 Classification of amino acids By structurally important properties Små Small Hydrofoba Mycket små Very small Pro Cys(SS) Alifatiska Ile Val Ala Leu Met Phe Gly Cys Thr Polära Ser Polar Asn Asp Tyr Trp Lys Glu His Arg Gln Laddade Aromatiska Positivt laddade Positiveley charged Page 17 Charged Efter Maniatis: Molecular Cloning se även Attwood s. 42, s. 150, Petsko &Ringe fig 1.3 35 Observed substitutions From comparison of proteins known to have evolved from a common ancestor • Most changes are conservative, preserving polar or hydrophobic character Petsko&Ringe Fig 1.6 36 Point mutations at the DNA level Single nucleotide polymorphism, SNP • Point mutations in the third position often have no effect on the amino acid • Point mutations in the first or second positions tend to preserve the hydrophbic or polar character of the amino acid (blue=hydrophobic; pink= hydrophilic; striped= amphiphatic) • Note that transitions C <-> U, A <-> G are more common than transversions pyrimidine <-> purine Petsko&Ringe Fig 1.4 Page 18