Structural biochemistry in this course

1
Why advanced level structural biochemistry?
•  Proteins are where the action is (A.Lesk).
•  Understanding molecular function at
the molecular level
•  Large number of detailed structures
known; possible to extract structural
principles
•  Enormous amount of sequences. Complete
genomes for 100 s of organisms.
•  Applied biochemistry
•  Structure-based drug design in
pharmaceutical industry
•  Protein engineering: Using molecular
biology to create modified proteins
(catalysis, stability…)
•  Determination of protein
structure is complicated and
done by specialists.
•  Using structural information
is not complicated. Required
knowledge for most practising
biochemists
•  Need of skills in using tools
for visualization and analysis
of structures. Molecular
graphics
2
Structural biochemistry in this course
Lectures: Petsko & Ringe
•  Principles of protein structure; structural classes
•  Methods for structure determination (introductory)
•  Protein stability and folding
•  Structure and function in cell signalling
•  Introduction to bioinformatics
•  Introduction to proteomics
•  Modelling of structures (starting from amino acid sequence)
Computer exercises
•  Molecular graphics: Using a basic software (Swiss-pdbviewer
[DeepView]) for analysis and visualization.
•  Assignment: Explore and visualize protein-ligand interactions
with Swiss-pdbviewer
Page 1
Protein structures
3
-Growth of the Protein Data Bank: number of 3D protein structures
100000
70000
Total number of structures
50000
April 2008: ca 50 000
structures, 42 000
proteins, 1000 DNA
10000
Total number of structures
60000
1000
40000
30000
20000
10000
100
10
1
0
Year
4
Sequences
•  DNA sequencing
Contents of TrEMBL
(deduced protein sequences from
EMBLnucleotide database):
Identification of coding regions,
conceptual translation.
•  Protein Sequencing
Labourious.
Modern methods (Edman degradation,
mass spectrometry) require very little
protein. SDS-PAGE sufficient. Short
sequences, useful for matching protein
against gene database
•  Posttranslational modifications can
only be detected at the protein level
Page 2
5
Annotation
Description of the content of a database entry
Manual analysis
Computerized analysis
Possible
coding regions:
DNA
sequence
ORF (open
reading frame;
exons/introns)
EMBL,
Genbank,
DDBJ
Protein
sequence
TrEMBL
Swissprot
Similar
sequences;
known
function?
Is the gene
expressed?
(Experiment
needed)
6
Genome sequencing and protein structure
• Identification of genes in complete genomes, many with
unknown function
•  Functional genomics , Proteomics ; identify all proteins i
a given cell at a given occasion.
• Structure determination for identifiction of function; largescale projects for structure determination. Structural
genomics
• Association of genes and function?
Page 3
7
Example: new journal
8
Proteins: biological function
•  Enzymes
Catalysis of biochemical reactions
•  Transport proteins
Membrane transport
Carrier proteins
•  Signal transduction, between cells and within cells
•  Structural proteins
Cytoskeleton
Extracellular structure proteins
Page 4
9
Function in the E. coli genome!
10
Small scale example!
• 
Minimal genome
project (Venter m. fl.);
Mycoplasma genitalium
(one of the first genomes
sequenced)
• 
517 genes, coding for 480
proteins
• 
Of these about 265-350
necessary for growth
(lab). From large scale
mutagenesis
• 
Of these, about 100 with
unknown function
Map of M. genitalium chromosome
White: Hypothetical proteinsconserved
Gray: Unclassified + unknown
function
Black: Hypothetical proteins
Page 5
11
poly-L-α-amino acids
Short polymers (< 40 aminosyror): di,
tri, ...oligopeptides.
(α)-Amino acids
•  Asymmetric carbon; två
Proteins, > 40 aminosyror, well-defined 3D structure
stereoisomerers. In
proteins only Laminosyror. [(S)-in the
O
CIP-convention]
H
R1 H
RN H
N
• 
N-terminal
O
+
look along the H-CH3N
N
Rn H
H
bond:
O
O C-terminal
+
R H
O
H3 N
O
R
CO
N
N-2
•  Peptide bonds
Clockwise: CO-R-N:
CORN
O
R1 H
+
O H3 N
N
H
Rn H
O
•  20 different amino acids in protein
synthesis. Others are found; modified
after synthesis (posttranslational
modification)
H2 O
R1 H
N
H
O
H
N
O
R1 H
12
Levels of protein structure
•  Primary structure
The order of amino acids, disulphides, covalent
modifications
•  Secondary structure
Structure of the peptide backbone (not side chains).
Sometimes regular
•  Supersecondary structure/motifs
•  Domains/folds
•  Tertiary structure
3D structure , includes all atoms
•  Quarternary structure
Arrangement of subunits in proteins with several peptide
chains
Cartoon
(simplified
representation)
of tertiary
structure
Petso&Ringe fig. 1.2
Page 6
13
Distribution of chain lengths
(statistics from the Swiss-Prot database)
14
The 3 D structure is specified by the primary
structure
The central dogma
•  Denatured (unfolded) proteins can refold
if denaturing conditions are relieved.
•  Some in vivo protein folding requires
catalysis by chaperones, enzymes that
facilitate folding
Replication
DNA
Transcription
RNA
•  If there is modification after initial
folding, denatured proteins may not
refold
Translation
Polypeptide chain
Folding
protein
Page 7
15
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
Alifaphatic I: alanine and valine
O
Alanin
Ala; A
H
C
N
CH3 H
O
Valin
Val; V
H C
CH3 CH N
H
CH3
0
5
10
Frekvens
(%)
16
Aliphatic II: leucine och isoleucine
O
O
Leucin
Leu;L
H
C
Isoleucin
H3 C CH
Ile; I
H C
N
CH2 H
CH
CH3 CH
3
CH2
CH3
Page 8
N
H
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
0
5
Frekvens
10
(%)
17
Hydroxyl group: serine, treonine
O
O
Serin
Ser; S
H
C
CH2
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
Treonin
Thr, T
N
H
OH
H
C
N
CH H
CH3
OH
0
5
Frekvens
10
(%)
18
Containing sulphur: cysteine (polar)
och methionine (nonpolar)
O
Cystein
Cys, C
O
H
C
N
CH2 H
Metionin
Met, M
H
H
N
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
0
SH
CH3
Page 9
5
Frekvens
S
10
(%)
19
Polar och neutral: the amides aspargine
and glutamine
O
Apspargin
Asn, N
H
O
C
H
O
C
N
H
Glutamin
Gln, Q
N
H
O
NH2
NH2
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
0
5
Frekvens
10
(%)
20
Acidic: aspartic acid och glutamic acidO
O
Asparginsyra
(Aspartat)
Asp, D
H
O
C
N
H
Glutaminsyra
(Glutamat)
Glu, E
H
C
N
H
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
0
O
O
Page 10
5
Frekvens
O
10
(%)
21
Basic: lysine and arginine
O
H
O
H
C
Lysin
Lys; K
C
Arginin
Arg, R
N
H
NH
N
H
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
0
H2N
NH2 +
5
Frekvens
10
(%)
+
NH4
22
Aromatic: phenylalanine and tyrosine
O
Fenylalanin
Phe, F
O
H C
Tyrosin
Tyr, Y
H
C
N
H
OH
N
H
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
0
5
Frekvens
Page 11
10
(%)
23
Aromatic/heterocyclic: histidine and tryptophan
O
Histidin
His, H
H
C
Tryptofan
Trp, W
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
O
N
H
H
C
N
N
H
N
H
0
N
5
Frekvens
10
(%)
24
Steric properties : glycine and
proline
O
O
Glycin
gly; G
Prolin
Pro; P
H C
N
H
H
H
C
N
L
A
G
S
V
E
T
K
I
D
P
R
Q
N
F
Y
H
M
C
W
0
5
Frekvens
Page 12
10
(%)
25
Abundance
Statistics from the Swiss-Prot database
26
Posttranslational modification, examples
O
Phosphorylation (Ser, Thr, Tyr)
OH
COO-
O
P
Phosphoserine, -treonine, -tyrosine
O
O
-
COO
γ-carboxyglutamate
-
COO
O
Hydroxylation
Hydroxyproline
N
OH
Discovered recently:
Tryptophyl
tryptophandione TTQ
O
Tyr
NH
O
2 Trp
HO
topaquinone, TPQ
N
H
O
O
Page 13
Fler exempel i Creighton
27
Summary of important properties
•  Hydrophobicity
•  Size of the side chain
•  Packning density in proten
interiors is comparable to
organic crystals
•  The hydrophobic effect is
contributes to the stablity of the 3-D
structure. Hydrophobic sidechains
are found mainly in theinterior of
proteins and hydrophilic sidechains
are found mainly on the outside
•  Charge and polarity
•  Steric properties
•  Electrostatic interactions
•  Hydrogen bonding
•  Glycine: minimum of steric
hindrance
•  Proton transfer (eg acid-base
catalysis)
•  Proline: less conformational
freedom
•  Also less conformational freedom in
branched aliphatics (Val, Ile).
•  Cysteine; disulphide bonds
28
Size (molecular volume)
250
Volym,Å^(3)
200
150
100
50
0
G A S C
D P
N T E
V Q H M I
Page 14
L
K R
F
Y W
29
Hydrophobicity
4
Hydrophobic
2
0
GES-hydropaty
The GoldmanEngelman-Steitz (GES)
scale for
hydrophobicity
(hydopathy) is based
ont the partitioning of
the amino acid
between aqueous and
organic solvent
-2
-4
-6
Hydrohilic
-8
-10
-12
-14
R D K E N Q H Y P S G T A W C V L
I
M F
Data från Brändén och Tooze, p. 210
30
Volume and hydrophobicity
Volume
Lesk: Introduction to protein science
Page 15
31
Protonation of side chains
O
O
O
OH
O
O
Asp, Glu
4.5; 4.6
-
HN
His
pKa
+
NH
HN
+
HN
NH
Cys
N
SH
6.2
9.1-9.5
S
Tyr
pKa of the side
chains in
proteins values
depend on the
local
surroundings
and vary
9.7
O
OH
Lys
Arg
10.4
NH3
NH2
N
H
+
NH2
NH2
N
H
NH2
+
NH2
N
H
+
ca 12
NH2
NH
32
Hydrogen bonding
Ser, Thr; sp3-O
Asp, Glu; sp2-O
Asn, Gln;
sp2-O; sp2-N
O
O
O
O
O
O
H
N H
H
sp2-N
His
H N
H N
Trp
H
N
N
+
H
sp3-N
Arg
N
H
Lys
N
+
H
N
N
H
H
Page 16
H
H+ H
N
H
33
Cysteine: disulphide bonds
H
N
H
O
HS
H
SH
H
H
H
NH
H
Oxidation
O
Reduktion
H
N
H
O
S
H
H
S
H
H
NH
H
Cystin, stabil endast i oxiderande miljö.
Förekommer i extracellulära proteiner
O
34
Classification of amino acids
By structurally important properties
Små
Small
Hydrofoba
Mycket små Very small
Pro
Cys(SS)
Alifatiska
Ile
Val
Ala
Leu
Met
Phe
Gly
Cys
Thr
Polära
Ser
Polar
Asn
Asp
Tyr
Trp
Lys
Glu
His Arg
Gln
Laddade
Aromatiska
Positivt laddade
Positiveley charged
Page 17
Charged
Efter Maniatis:
Molecular Cloning
se även Attwood s. 42,
s. 150, Petsko &Ringe fig 1.3
35
Observed substitutions
From comparison of proteins known to have evolved from a common ancestor
• Most changes are
conservative,
preserving polar or
hydrophobic
character
Petsko&Ringe Fig 1.6
36
Point mutations at the DNA level
Single nucleotide polymorphism, SNP
•  Point mutations in the third
position often have no effect on
the amino acid
•  Point mutations in the first or
second positions tend to
preserve the hydrophbic or
polar character of the amino
acid (blue=hydrophobic; pink=
hydrophilic; striped=
amphiphatic)
•  Note that transitions
C <-> U, A <-> G
are more common than
transversions
pyrimidine <-> purine
Petsko&Ringe Fig 1.4
Page 18