Professional Documents
Culture Documents
PMP
Part One:
Understanding MS data
PMP
Data Acquisition
tr yp m yo 0 1 # 2 9 4 R T : 9 .8 9 A V : 1
T : + c F u l l m s [ 3 0 0 .0 0 - 1 6 0 0 .0 0 ]
N L : 1 .1 2 E 7
6 6 1 .6
100
3. Isolate ion
95
90
2. Select an ion
85
80
75
70
Relative Abundance
65
60
55
50
45
40
35
7 0 4 .2
4 9 6 .4
30
25
20
15
5 2 8 .3
9 9 1 .7
6 1 8 .7
3 4 2 .7
7 0 5 .1
10
4 6 4 .4
9 9 2 .6
5 8 0 .4
7 9 9 .9
9 2 7 .1
9 5 2 .3
1 1 2 8 .2
1 2 8 9 .8
1 3 8 7 .1
0
400
600
800
1000
1200
m /z
PMP
Source:
Peter James
1400
1 4 8 5 .0
1 5 4 1 .3
1600
4. Fragment ion
MS/MS scan
Source:
Markus Mller
PMP
What is a baseline?
It is an offset of the intensities of masses, which happens
mainly at low masses, and varies between different spectra
It results mainly from molecules of the matrix
The baseline has to be subtracted from the spectrum before
any further processing takes place
What is noise?
Rapidly varying part of spectrum (< 1
Dalton)
Mass spectra are affected by two types
of noise:
electrical, a result of the instrument used
chemical, a result of contaminants in the
sample or matrix molecules
PMP
Oxygen-
16O
(99.8%),
17O
Sulphur-
32S
(95.0%),
33S
Bromine-
79Br
(50.5%),
(0.04%),
(0.8),
81Br
34S
18O
(0.2%)
(4.2%)
(49.5%)
*1 part per trillion
amu=atomic mass units
What is Mass?
Mass is given as m/z which is the mass of the molecule
divided by its charge
Monoisotopic mass is the mass of a molecule for a given
empirical formula calculated using the exact mass of the most
abundant isotope of each element (C=12.00000, H=1.007825
etc)
Average mass is the mass of a molecule for a given empirical
formula calculated using the weighted average mass for each
element weighted by the isotopic abundances (C=12.01115,
H=1.00797 etc)
PMP
PMP
PMP
Mass Accuracy
Accurate: consistent with the theory
Mass accuracy is measured in parts per
million (ppm)
545.4200
Measured mass
545.4200
Theoretical (calculated) mass
545.3234
Mass accuracy = (545.4200545.3234)/545 =0.00011724 Da
= 0.00011724 x 106 = 117 ppm
A monoisotopic mass can be measured as accurately as
the instrument allows, as long as the monoisotopic peak
has been correctly identified. If the wrong peak is
selected, the mass value will be false by one or two
Daltons.
PMP
Adapted
from: Peter James
true mass
m/z
PMP
Source:
Peter James
524.3
100
95
90
85
80
75
70
Relative Abundance
65
60
55
50
45
= 1.0 Da
40
35
30
525.3
25
= 1.0 Da
20
15
10
526.3
5
0
520
PMP
Source:
?
521
522
523
524
525
m/z
526
527
528
529
262.6
100
95
90
85
80
75
70
Relative Abundance
65
60
55
50
45
= 0.5 Da
40
35
30
25
263.1
20
= 0.5 Da
15
10
263.6
5
0
258
PMP
Source:
?
259
260
261
262
263
m/z
264
265
266
267
PMP
Source:
Peter James
M5+
M3+
4+
(X+4)/4
(X+5)/5
(X+3)/3
M2+
M+
(X+2)/2
Mass/Charge
(X+1)/1
Part Two:
Bioinformatics tools
PMP
PMP
PMP
PMP
MS/MS
840.6950 13.75
1676.9606 26.1
1498.8283 128.9
1045.564 845.2
2171.9670 2.56
861.1073 371.2
842.51458 53.7
1456.7274 12.9
863.268365 3.1
1163.7008 2
86.1105 220.1429
86.1738 13.7619
102.0752 4.3810
147.1329 57.3333
185.1851 649.0953
185.3589 5.3810
186.1876 81.4286
213.0791 1.4286
fragment intensities
MS
PMP
Example
1603.9204 2
101.0909 15.4762
202.1079 21.4762
203.1045 5.3333
244.1280 14.3810
254.1056 3.4286
255.1910 7.2381
270.2388 2.1905
962.6160 2
70.0560 2.1224
86.0947 5.1565
115.0842 8.4263
PMP
BEGIN IONS
TITLE=A1.1013.1013.2
CHARGE=2+
PEPMASS=715.940915
218.251 1.6
259.403 1.7
271.122 1.2
284.268 1.4
287.317 2.3
297.139 1.2
326.877 1.9
END IONS
BEGIN IONS
TITLE=A1.1013.1013.2
PMP
Source:
Matrix Science site
PMP
Source:
http://www.psidev.info/
http://www.expasy.org/tools/
PMP
PMP
MS-MS data
Peptides
Enzymatic
digestion
Mass spectra
Peaklist
840.695086
1676.96063
1498.8283
1045.564
2171.967066
861.107346
842.51458
1456.727405
863.268365
Match
MAIILAGGHSVRFGPKAF
AEVNGETFYSRVITLESTNM
FNEIIISTNAQLATQFKYPN
VVIDDENHNDKGPLAGIYTI
MKQHPEEELFFVVSVDTPM
ITGKAVSTLYQFLV
Sequence
database entry
PMP
In-silico
digestion
- MAIILAGGHSVR
- FGPK
- AFAEVNGETFYSR
- VITLESTNMFNEIIISTNAQLATQFK
- YPNVVIDDENHNDK
Theoretical
proteolytic peptides
861.107346
838.695086
1676.96063
1498.8283
1045.564
2171.967066
842.51458
1457.827405
863.268453
Theoretical
peaklist
Result:
ranked list
of protein
candidates
Source:
Introduction to proteomics: tools for the new biology. Daniel C. Liebler. Human Press. 2002
PMP
PMP
Tool
Source website
Aldente
www.expasy.org/cgi-bin/aldente
Mascot
www.matrixscience.com/
MS-Fit
prospector.ucsf.edu/
ProFound
prowl.rockefeller.edu/profound_bin/WebProFound.exe
PepMAPPER
wolf.bms.umist.ac.uk/mapper/
PeptideSearch
www.mann.emblheidelberg.de/GroupPages/PageLink/peptidesearchpage.html
PepFrag
prowl.rockefeller.edu/prowl/pepfragch.html
Scoring systems
Essential for the identification! Gives a confidence
value to each matched protein
Three types of scores
Shared peaks count (SPC): simply counts the number
of matched mass values (peaks)
Probabilistic scores: confidence value depends on
probabilistic models or statistic knowledge used during the
match (obtained from the databases)
Statistic-learning: knowledge extraction from the
influence of different properties used to match the proteins
(obtained from the databases)
PMP
Mascot
http://www.matrixscience.com/
PMP
Mascot - principles
Probability-based scoring
Computes the probability P that a match is
random
Significance threshold p< 0.05 (accepting that
the probability of the observed event occurring
by chance is less than 5%)
The significance of that result depends on the
size of the database being searched.
Mascot shades in green the insignificant hits
Score: -10Log10(P)
PMP
Mascot
Input
PMP
Decoy
Output
PMP
Output
Sequence coverage
Peptides matched
Error
function
PMP
Aldente
SwissProt/TrEMBL db, indexed masses (trypsine and many
others).
Considers chemical modifications and user specified
modifications.
Considers biological modifications (annotations SWISS-PROT).
0 or 1 missed-cleavages.
Use of robust alignment method (Hough transform):
Determines deviation function of spectrometer
Resolves ambiguities
Less sensitive to noise
PMP
Aldente summary
Spectrometer
calibration error
Spectrometer
internal error
PMP
Aldente - Input
PMP
Output
Hints about
the
significance of
the score
PMP
PMP
BioGraph
PMP
PMP
If you have time, try many tools and compare the results
FindMod
GlycoMod
FindPept
PMP
Prediction tools
PTMs and AA substitutions
Oligosaccharide structures
Unspecific cleavages
PMP
PMP
600
600
1759.8
1759.81923.4
1923.4
994.5
994.5
600
600
2100.6
2100.6
1070.1
1326.7
1326.7
1501.9
1501.9
1056.1
769.8
769.8
893.4
893.4
Tryptic
masses of
a modified
protein
2100.6
1326.7
1501.9
1056.1
893.4
769.8
893.4
994.5
994.5
1056.1
624.3
769.8
2100.6
1326.7
1501.9
1759.8
1759.8
1923.4
1923.4
624.3
Unmodified
tryptic
masses
624.3
624.3
Detection of PTMs in MS
2200
2200
2200
2200
m/z
m/z
=>
=> PTM
PTM
FindMod
http://www.expasy.org/tools/findmod/
AA
nt r
e
B
D
tal
n
e
m
ri
expe s
se
mas
PMP
on
i
t
a
c
if i
mod
tal
n
e
rim
e
p
x
e
ns
o
i
t
op
FindMod Output
}
}
unmodified peptides,
modified peptides
known in SWISS-PROT
and chemically modified
peptides
putatively modified
peptides predicted
by mass differences
+ putative AA substitutions
PMP
exceptions
farnesylation
Cys
palmitoylation
O-GlcNAc
Cys
Ser, Thr
amidation
pyrrolidone carboxylic acid
Gln (N-term)
Ser, Thr
Asn
-
Peptide fragmentation
fingerprinting = PFF = ion search
MS/MS database matching
Protein(s)
Peptides
Enzymatic
digestion
Ions peaklists
MS/MS spectra
of peptides
340.695086
676.96063
498.8283
545.564
1171.967066
261.107346
342.51458
456.727405
363.268365
Match
MAIILAGGHSVRFGPKAF
AEVNGETFYSRVITLESTNM
FNEIIISTNAQLATQFKYPN
VVIDDENHNDKGPLAGIYTI
MKQHPEEELFFVVSVDTPM
ITGKAVSTLYQFLV
In-silico
digestion
Sequence
database entry
PMP
- MAIILAGGHSVR
- FGPK
- AFAEVNGETFYSR
- VITLESTNMFNEIIIK
- YPNVVIDDENNDK
In-silico
fragmentation
Theoretical
proteolytic peptides
-MAIILAG
-MAIILA
-MAIIL
-MAII
-MAI
-M
-M
-AIILAG
Theoretical
fragmented
peptides
361.107346
338.695086
676.96063
498.8283
1045.564
1171.967066
342.51458
457.827405
263.268453
Theoretical
peaklist
Result:
ranked list
of peptide
and
protein
candidates
Ion-types
offset
-28
-45
-46
0
-17
-18
+17
+28
+2
-15
-16
P' nterm
Usually a, b and y
-15
[N] is the mass of the N-term group
[M] is the mass of the sum of the neutral amino acid residue
masses
PMP
P' cterm
...
MAPNCSCK
b7 MAPNCSC K y1
b6 MAPNCS CK y2
b5 MAPNC SCK y3
b4 MAPN CSCK y4
...
b7
b6
b5
b4
PMP
y1
b5
b2
y4 b4
y2
b7
y6
[M+2H]2+
y1
b4
y4
b3
y8
b6
b5
b2
y2
y7
y5
b3
y3
b1
[M+2H]2+ b6
y3
b1
y7
y5
y6
b7
y8
There is a shift around the Serine (neutral loss H3PO4), + other ions: a, c, x,
H2O and NH3 neutral losses, immoniums, internal fragments, missing ions
Instruments
Q-TOF
easier to interpret
since the spectra
are less crowded
(cleaner spectra)
higher accuracy
and resolution which compensates
for low frequency
of a and b ions
PMP
3D Ion traps
a, b and y ionsless easier to
interpret but more
significant since
more ions to
confirm the
sequence (noisy
spectra)
PMP
Software
Source website
InsPecT
peptide.ucsd.edu/inspect.py
Mascot
www.matrixscience.com/search_form_select.html
prospector.ucsf.edu
PepFrag
prowl.rockefeller.edu/prowl/pepfragch.html
Phenyx
phenyx.vital-it.ch
Popitam
www.expasy.org/tools/popitam
ProID (download)
sashimi.sourceforge.net/software_mi.html
Sequest*
fields.scripps.edu/sequest/index.html
Sonar
65.219.84.5/service/prowl/sonar.html
SpectrumMill*
www.home.agilent.com
VEMS
www.bio.aau.dk/en/biotechnology/vems.htm
X!Tandem (download)
www.thegpm.org/TANDEM
*Commercialized
PMP
Sequest/Turbosequest output
PMP
PMP
Submission
Results
views
Desktop
Results comparison
Management console
List of identified
proteins
Protein group
description
Corresponding list of
identified peptides
PMP Binz
P.A.
PMP
PMP
Better when high intensity peaks are matched and ion series are
extended, without too many and too big holes
PMP
List of
peptides
Description and
scores for each job
Action menu
List of Proteins
Action menu
Peptide scores
for each job
PMP
X!Tandem
PMP
www.thegpm.org
X!Tandem - output
1
2
PMP
2rnd,
Add variable mods
205 valid,
84% cov.
2rnd,
With all mods
And half cleaved
348 valid,
90% cov.
PMP
PMP
De novo sequencing
Sequencing = read the full peptide
sequence out of the spectrum (from
scratch)
Then, eventually search database for
sequence (not necessarily)
PMP
PMP
Summary of de novo
sequencing tools
Software
Source website
PEAKS*
www.bioinformaticssolutions.com
SeqMS (download)
www.protein.osaka-u.ac.jp/rcsfp/profiling/SeqMS.html
Sherenga
(included
SpectrumMill)
Luthefisk (download)
N/A
www.hairyfatguy.com/Lutefisk
DeNovoX*
www.thermo.com
PepNovo (download)
peptide.ucsd.edu/pepnovo.py
SpectrumMill*
www.home.agilent.com
*Commercialized
PMP
in
PepNovo
scoring method uses a
probabilistic network whose
structure reflects the chemical
and physical rules that govern
the peptide fragmentation
specific for Ion Trap data
PMP
Decoy database
Is used to repeat the search, using identical search
parameters, against a database in which the sequences
have been reversed or randomised.
Do not expect to get any real matches from the "decoy"
database
Helps estimate the number of false positives that are
present in the results from the real database.
It is a good validation method for MS/MS searches of large
data sets, it is not as useful for a search of a small number
of spectra, because the number of matches is too small to
give an accurate estimate.
PMP
PMP
Help validating in
Mascot, Phenyx and X!Tandem
In Mascot - decoy shuffled with the
same average amino acid
composition (Decoy parameter).
Otherwise, you may create your
own.
X!Tandem reversed sequences
Phenyx reversed sequences
available, otherwise you choose to
create your own
PMP
E-values
For a given score S, it indicates the number of
matches that are expected to occur by chance
in a database with a score at least equal to S.
The e-value takes into account the size of the
database that was searched. As a consequence
it has a maximum of the number of sequences
in the database.
The lower the e-value, the more significant the
score is.
PMP
p-value
A p-value describes the probability, which
assesses the chance of validly rejecting the null
hypothesis. If the p-value is 10-5 then the
rejection of the null hypothesis is due to chance
with a probability of 10-5.
A p-value ranges between 0 and 1.0
The larger the search space, the higher the pvalue since the chance of a peptide being a
random match increases.
The lower the p-value, the more significant is the
match.
Source:
Lisacek, Practical Proteomics, 2006 Sep;6 Suppl 2:22-32
PMP
Z-score
Z-score is a dimensionless quantity
derived by subtracting the population
mean from an individual (raw) score and
then dividing the difference by the
population standard deviation. z score = x
Source:
wikipedia
PMP
So what?
For small (significant) p-values, p and e are
approximately equal, so the choice of one or the
other is often equivalent. It is therefore
reasonable to assimilate low p-values in Phenyx
to e-values. X!Tandem simply switches e-values
to log values to remove the powers of 10
For a single search (or set of sampled peptides),
you can compare z-scores. However, when two or
more searches are performed on different size
spaces, you first need to look at the p-values
before comparing z-scores.
Source:
Lisacek, Practical Proteomics, 2006 Sep;6 Suppl 2:22-32
PMP
PMP