You are on page 1of 10

Anal. Chem.

1997, 69, 767-776

Direct Analysis and Identification of Proteins in Mixtures by LC/MS/MS and Database Searching at the Low-Femtomole Level
Ashley L. McCormack, David M. Schieltz, Bruce Goode, Shirley Yang, Georjana Barnes, David Drubin, and John R. Yates, III*,

Department of Molecular Biotechnology, Box 357730, University of Washington, Seattle, Washington 98195-7730, and Department of Molecular and Cell Biology, University of California, Berkeley, California 94720

A method to directly identify proteins contained in mixtures by microcolumn reversed-phase liquid chromatography electrospray ionization tandem mass spectrometry (LC/MS/MS) is studied. In this method, the mixture of proteins is digested with a proteolytic enzyme to produce a large collection of peptides. The complex peptide mixture is then separated on-line with a tandem mass spectrometer, acquiring large numbers of tandem mass spectra. The tandem mass spectra are then used to search a protein database to identify the proteins present. Results from standard protein mixtures show that proteins present in simple mixtures can be readily identified with a 30-fold difference in molar quantity, that the identifications are reproducible, and that proteins within the mixture can be identified at low femtomole levels. Based on these studies, methodology has been developed for direct LC/MS/MS analysis of proteins enriched by immunoaffinity precipitation, specific interaction with a protein-protein fusion product, and specific interaction with a macromolecular complex. The approach described in this article provides a rapid method for the direct identification of proteins in mixtures.
Tandem mass spectrometry has developed into a powerful technique for complex mixture analysis with numerous applications in the analytical sciences.1,2 An important application of tandem mass spectrometry in the biochemical sciences is the sequence analysis of peptides and proteins.3-6 Sequence analysis is performed by separating a peptide ion in the first mass analyzer and transmitting the ion to a collision cell or other device for activation. Methods such as gas phase collision-induced dissociation (CID), surface-induced dissociation (SID), and photon-induced dissociation (PID) have been employed for the activation of
* Address correspondence and reprint requests to this author. TEL: (206) 685-7388. FAX: (206) 685-7301. E-mail: jyates@u.washington.edu. University of Washington. University of California. (1) Tandem mass spectrometry; McLafferty, F. W., Ed.; Wiley: New York, 1983. (2) Busch, K. L.; Glish, G. L.; McLuckey, S. A. Mass Spectrometry/Mass Spectrometry: Techniques and Applications of Tandem Mass Spectrometry; VCH Publishers, Inc.: New York, NY, 1988. (3) Hunt, D. F.; Yates, J. R., III; Shabanowitz, J.; Winston, S.; Hauer, C. R. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 6233-6237. (4) Biemann, K. Biomed. Environ. Mass Spectrom. 1988, 16, 99-11. (5) Medzihradsky, K. F.; Burlingame, A. L. Methods: A Companion to Methods in Enzymology; Academic Press: San Diego, CA, 1994; Vol. 6, 284-303. (6) McCormack, A. L.; Eng, J.; Yates, J. R., III. Methods: A Companion to Methods in Enzymology; Academic Press: San Diego, CA, 1994; Vol. 6, 274-283. S0003-2700(96)00799-8 CCC: $14.00 1997 American Chemical Society

peptide ions.3-11 At present, the most widely used technique for peptide ion dissociation employs gas phase collisions at low energy. Fragment ions produced are analyzed in the second mass analyzer. Fortuitously, peptides fragment primarily at their amide bonds, creating a ladder of ions representing the amino acid sequence. Tandem mass spectrometry experiments are not confined to spatially separated mass analyzers but can also be performed with instruments that can separate ions in time, such as quadrupole ion trap and Fourier transform ion cyclotron resonance mass spectrometers.12,13 The capability for mixture analysis inherent in tandem mass spectrometry has been greatly improved through the combination of electrospray ionization and tandem mass spectrometry.14-17 By combining liquid chromatography with tandem mass spectrometry, on-line analysis of molecules present in complicated mixtures is possible.17 Thus, molecules are separated by their chemical properties in the chromatography step and then separated by m/z value with subsequent structural characterization in an MS/MS mode. Incorporation of computer methods to allow data-dependent data acquisition in real time has permitted the development of efficient methods for the acquisition of tandem mass spectra.18,19 Over the course of a liquid chromatography separation, it is possible to acquire large numbers of spectra, in some instances as frequently as one every 5-10 s.18,20
(7) Cooks, R. G.; Amy, J. W.; Bier, M. E.; Schwartz, J. C.; Schey, K. L.; Chidsey, C. D. Adv. Mass Spectrom. 1988, 11, 33-50. (8) Wysocki, V. H.; Jones, J. L.; Ding, J. M. J. Am. Chem. Soc. 1991, 113, 89698970. (9) McCormack, A. L.; Somogyi, A.; Dongre, A. R.; Wysocki, V. H. Anal. Chem. 1993, 65, 2859-2872. (10) Hunt, D. F.; Shabanowitz, J.; Yates, J. R., III. J. Chem. Soc., Chem. Commun. 1987, 548-550. (11) Martin, S. A.; Hill, J. A.; Kittrell, C.; Biemann, K. J. Am. Soc. Mass Spectrom. 1990, 1, 107-109. (12) Senko, M. W.; Beu, S. C.; McLafferty, F. W. Anal. Chem. 1994, 66, 415417. (13) Cox, K. A.; Williams, J. D.; Cooks, R. G.; Kaiser, R. E., Jr. Biol. Mass Spectrom. 1992, 21, 226-241. (14) Hail, M.; Lewis, S.; Jardine, I.; Liu, J.; Novotny, M. J. Microcolumn Sep. 1990, 2, 285-290. (15) Huang, E. C.; Henion, J. D. J. Am. Soc. Mass Spectrom. 1990, 1, 158-165. (16) Griffin, P. R.; Coffman, J. A.; Hood, L. E.; Yates, J. R., III. Int. J. Mass Spectrom. Ion Processes 1991, 111, 131-149. (17) Hunt, D. F.; Henderson, R. A.; Shabanowitz, J.; Sakaguchi, K.; Michel, H.; Sevilir, N.; Cox, A. L.; Apella, E.; Engelhard, V. N. Science (Washington, D.C.) 1992, 255, 1261-1263. (18) Yates, J. R., III; Eng, J.; McCormack, A. L.; Schieltz, D. Anal. Chem. 1995, 67, 1426-1436. (19) Stahl, D. C.; Swiderek, K. M.; Davis, M. T.; Lee, T. D. J. Am. Soc. Mass Spectrom. 1996, 7, 532-540.

Analytical Chemistry, Vol. 69, No. 4, February 15, 1997 767

Table 1. Description of the Standard Protein Mixture Derived from the SDS Gel Electrophoresis Low Molecular Weight Markers protein ALBU-Bovin OVAL-Chick G3P1-Rabit CAH2-Bovin TRYP-Bovin ITRA-Soybn LCA-Bovin molecular mass (Da) 66 000 45 000 36 000 29 000 24 000 20 000 14 000 relative concn (fmol)a 116 169 212 262 317 380 544 tryptic peptideb 52 23 23 14 14 15 8

a Relative concentration of proteins in approximately 2 pmol of total protein. b The number of tryptic peptides with m/z values falling in the scan range m/z 500-1200, assuming 1+, 2+, and 3+ charge states (500-3600 Da).

Recent developments to match uninterpreted tandem mass spectra to sequences in protein or nucleotide databases provide the potential for automated high-throughput data analysis.18,21,22 In addition, a benefit from using tandem mass spectra to search databases is the highly specific and independent nature of each spectrum. The fragmentation patterns representing the amino acid sequences in each tandem mass spectrum form unique signatures for each protein present. In the present research, we have developed a strategy for the identification of proteins contained in mixtures by using microcolumn liquid chromatography electrospray ionization quadrupole mass spectrometry (LC/ MS/MS) in conjunction with the database searching algorithm, SEQUEST. This method has been applied to the direct LC/MS/ MS analysis of proteins enriched by immunoaffinity precipitation, specific interaction with a protein-protein fusion product, and specific interaction with a macromolecular complex. MATERIALS AND METHODS Standard Protein Mixture. The low molecular weight standard protein marker mixture (SDS-7 Dalton Mark VIIL) was purchased from Sigma Chemical Co. (St. Louis, MO). It contains equal weights of R-lactalbumin (LCA) (bovine), trypsin inhibitor (ITRA) (soybean), PMSF-treated trypsinogen (TRYP) (bovine), carbonic anhydrase (CAH2) (bovine), glyceraldehyde-3-phosphate dehydrogenase (G3PI) (rabbit), albumin (OVAL) (egg), and albumin (ALBU) (bovine), as listed in Table 1. Phosphorylated R-casein (CAS1) (bovine) (Catalog No. L0406) was also purchased from Sigma Chemical Co. (St. Louis, MO). S-Carboxyamidation and Proteolytic Digestion. The standard protein mixture was diluted with 100 mM Tris-HCl, pH 8.5, to a final concentration of 0.20 nmol/L protein in 80 mM TrisHCl, 1.6 M guanidine-HCl. The disulfide bonds were reduced by the addition of dithiothreitol (300-fold molar excess over total protein) to a solution of the protein mixture in 80 mM Tris-HCl, pH 8.5. The reaction was allowed to proceed for 2.5 h in the dark
(20) McCormack, A. L.; Eng, J. K.; DeRoos, P. C.; Rudensky, A. Y.; Yates, J. R., III. Microcolumn liquid chromatography-Electrospray ionization tandem mass spectrometry: Analysis of immunological samples. In Biochemical and Biotechnological Applications of Electrospray Ionization Mass Spectrometry; Snyder, A. P., Ed.; American Chemical Society: Washington, DC, 1995; pp 207-225. (21) Eng, J.; McCormack, A. L.; Yates, J. R., III. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. (22) Yates, J. R., III; Eng, J.; McCormack, A. L. Anal. Chem. 1995, 67, 32023210.

at 45 C. The free cysteines were S-carboxyamidated by the addition of a 630 M excess of iodoacetamide over the total protein in the mixture. The pH of the solution was adjusted with 1 M Tris-HCl, giving a final concentration of 100 mM Tris-HCl, and the reaction was allowed to proceed for 1 h in the dark at 45 C. The reaction was stopped by the addition of dithiothreitol (300fold molar excess over total protein) to the mixture, and the reaction was allowed to stand for 1.5 h at 45 C. A 2 nmol aliquot of the derivatized protein mixture was diluted to a final concentration of 40 pmol/L in 100 mM Tris-HCl and 355 mM guanidineHCl and digested using trypsin (1 g) overnight at 37 C. Aliquots of the peptide mixture were diluted with 0.5% acetic acid and were analyzed directly. R-Casein was digested with trypsin as described above. Aliquots corresponding to 12 and 40 fmol of the tryptic digest were added to two separate samples containing 2 pmol of the total standard protein mixture. Immunoaffinity Purification of RAS. Wild-type Saccharomyces cerevisiae cells (strain AB1380) were grown to a stationary phase 108 cells/mL in 500 mL of rich media. The cells were harvested by centrifugation and washed with H2O. A total of 600 mg of cells (wet weight) in 1 mL of lysis buffer, containing 14.2 mM sodium phosphate, 30 mM sodium fluoride, 0.075 mM sodium metavanadate, 5 mM Tris-HCl pH 8, 100 mM NaCl, 2.5 mM EDTA, 3.0 mg/mL pepstatin, 1.5 mg/mL leupeptin, and 2 mM phenylmethylsulfonyl fluoride, were lysed using a Mini Beadbeater apparatus (Biospec Products, Bartlesville, OK), following the protocol of Kolodziej and Young.23 Protein content was determined to be 19 mg/mL by the Bradford assay24 (Bio-Rad, Hercules, CA). Cell wall debris was removed from the extract by centrifugation at 120000g for 30 min and passed over a preclearing column that consisted of a 1 mL bed volume of mouse IgG-agarose (Pierce, Rockville, IL). The protein content at this point was determined to be 10 mg/mL. A rat monoclonal antibody reactive against human, rat, and yeast RAS25,26 was used to probe the lysate. The antibody recognizes the conserved GTPase activating protein (GAP) binding site of the RAS protein. The v-H-RAS antibody (Santa Cruz Biotechnology, Santa Cruz, CA) was supplied conjugated to agarose beads. The protein extract, 1.5 mL, was incubated with 100 L of the beads for 2 h at 4 C in a 0.64 cm 5 cm Bio Spin disposable column (Bio-Rad). The supernatant was removed, and the agarose beads were washed with 10 mL of lysis buffer, followed by 1.5 mL of 0.2 M NaCl and 2 mL of distilled water. Elution was carried out by washing the beads with 1 mL of 1.0 M acetic acid. The solution was concentrated to 50 L by lyophilization, and a 10 L aliquot was saved for analysis by SDSPAGE. The pH of the remaining solution was adjusted to 8.5 with 1 M ammonium bicarbonate, followed by addition of 200 g of trypsin for digestion. The solution was incubated at 37 C for 4 h. A 2 L aliquot diluted with 0.5% acetic acid was loaded onto the microcolumn for analysis by tandem mass spectrometry. Isolation of GST-Sla2 Binding Proteins. S. cerevisiae cells were grown to log phase and harvested. Cells were washed in distilled and deionized water (ddH2O), resuspended in 2 lysis
(23) Kolodziej, P.; Young, R. In Guide to Yeast Genetics and Molecular Biology; Guthrie, C., Fink, G., Eds.; Academic Press: San Diego, CA, 1991; p 508. (24) Bradford, M. Anal. Biochem. 1976, 72, 248-254. (25) Ellis, R. W.; DeFeo, D.; Shih, T. Y.; Gonda, M. A.; Young, H. A.; Tsuchida, N.; Lowy, D. R.; Scolnick, E. M. Nature 1981, 292, 506-511. (26) Shih, T. Y.; Papageorge, A. G.; Stokes, P. E.; Weeks, M. O.; Scolnick, E. M. Nature 1980, 287, 686-691.

768

Analytical Chemistry, Vol. 69, No. 4, February 15, 1997

buffer (1 lysis buffer ) 50 mM HEPES, pH 7.5, 50 mM KCl, 1 mM EGTA, 10% glycerol) and protease inhibitors (antipain, leupeptin, pepstatin A, chymostatin, aprotinin, 0.1 M phenylmethylsulfonyl fluoride, 1 mM benzamide, and 0.1 mg/mL phenanthroline), frozen with liquid nitrogen, and then ground in a Waring blender for 3-5 min. After the lysate thawed, NP40 was added to a concentration of 0.2%. The extract was spun for 30 min at 10000g. Ten milliliters of the supernatant (15 mg/mL) was passed through a 10 mL CL-4B Sepharose (Pharmacia, Piscataway, NJ) blank column that was equilibrated in 1 lysis buffer containing 0.2% NP40 and protease inhibitors. The flow-through passed through a 0.4 mL column composed of glutathione-S-transferase (GST) bound to glutathione covalently attached to 4% agarose beads (Sigma Chemical Co., St. Louis, MO) and then on to a 0.4 mL Sla2-GST column. The proteins (i.e., either GST or Sla2-GST) were present at 1 mg/mL on the columns. The flow rate was 3 mL/h. The column was then washed with 10-15 mL of lysis buffer without NP40. Proteins were eluted with 0.6 M KCl in lysis buffer, collecting 0.2 mL fractions. Gel electrophoresis was used to locate the fractions containing protein. Finally, the elutant was dialyzed in lysis buffer (minus the glycerol) for at least 12 h, and the samples were quickly frozen in liquid nitrogen. The proteins were proteolytically digested as described above, and 15 L from 150 L was used for analysis. Isolation of Microtubule-Associated Proteins. Tubulin Purification. Tubulin from bovine brain was purified by two cycles of temperature-induced polymerization/depolymerization according to the methods of Mitchison and Kirschner.27 Tubulin was further purified by phosphocellulose and DEAE chromatography in PME buffer (80 mM Pipes-KOH, pH 6.8, 1 mM MgCl2, 1 mM EGTA) plus 1 mM GTP to remove all detectable microtubuleassociated proteins (MAPs) and then desalted and concentrated using a Centriprep-10 device (Amicon, Beverly, MA). Tubulin (8 mg/mL) was drop-frozen in 40 L aliquots in liquid nitrogen and stored at -80 C. Assembly of Paclitaxel-Stabilized Microtubules. Approximately 1.6 mg of tubulin (200 L) was thawed and cleared by centrifugation in a TLA100 rotor (Beckman, Fullerton, CA) for 5 min, 50 000 rpm, 4 C, in a TLA100 tabletop ultracentrifuge. GTP (100 mM stock in H2O) was then added to a final concentration of 1 mM, and the mixture was incubated at 37 C for 10 min to initiate microtubule assembly. Paclitaxel (2 mM stock in DMSO) was then added to a final concentration of 20 M, and the reaction was incubated for an additional 30 min at 37 C to complete assembly and stabilization of microtubules. These microtubules are stable at room temperature for hours. Microtubule Cosedimentation Assay. Yeast cell lysates were prepared as follows: Fresh 1 lb bricks of S. cerevisiae (Red Star Yeast Co., Emeryville, CA) were washed two times with 1 vol of ice-cold nanopure H2O (1 L), resuspended in 0.2 vol of nanopure H2O, drop-frozen in liquid nitrogen, and stored at -80 C. For a cosedimentation assay, 100 g of frozen yeast was lysed in a 1 L Waring blender by four rounds of high-speed blending in 20-30 s bursts with subsequent addition of more liquid nitrogen to cover cells. After the final blending, the powdered cell lysate was removed to a beaker at room temperature and thawed with 100 mL of PMEX buffer (PME buffer supplemented with 1 mM DTT and protease inhibitors). The protease inhibitor solution consisted of antipain, leupeptin, pepstatin A, chymostatin, apro(27) Mitchison, T.; Kirschner, M. Nature 1984, 312, 232-237.

tinin, 0.1 M phenylmethylsulfonyl fluoride, 1 mM benzamide, and 0.1 mg/mL phenanthroline.28 The thawed lysate was then cleared by low-speed centrifugation for 30 min, 12 000 rpm, 4 C, in a GSA rotor (Sorvall) and high-speed centrifugation for 90 min, 44 000 rpm, 4 C, in a 45 Ti rotor (Beckman, Fullerton, CA). The high-speed supernatant (HSS) was carefully removed by pipetting (avoiding the upper lipid phase and lower cell debris) and filtered through six layers of cheesecloth. To 30 mL of HSS was added paclitaxel to a final concentration of 10 M, followed by addition of 1.5 mg of paclitaxel-stabilized microtubules (final tubulin concentration, 0.5 M), and the reaction was incubated for 30 min at 4 C with occasional gentle mixing. Microtubules were then pelleted by centrifugation in a 45 Ti rotor for 20 min, 4 C, 44 000 rpm, and resuspended with 2 mL of PMEX buffer, supplemented with 10 M paclitaxel. Microtubules were then pelleted again by centrifugation, this time in a TLS-55 rotor (Beckman, Fullerton, CA) for 10 min, 50 000 rpm, 4 C, in a TLA100 tabletop ultracentrifuge and resuspended in 1 mL of PMEX buffer, supplemented with 10 M paclitaxel plus 0.1 M KCI. Microtubules were then pelleted (as above in the TLS-55 rotor), and the supernatant (low-salt supernatant) was saved by immediate drop-freezing in liquid nitrogen. The microtubules were resuspended in PMEX buffer, supplemented with 10 M paclitaxel plus 0.5 M KCI. Microtubules were then pelleted as before, and the supernatant (high-salt supernatant) was saved by immediate drop-freezing. The low-salt and high-salt supernatants were subsequently desalted and concentrated to 200 L volumes in Centricon-10 devices (Amicon, Beverly, MA). Samples from before and after desalting/concentrating were saved and analyzed by SDS-PAGE on 8% gels. The proteins were proteolytically digested as described above, and 8 L from 200 L was used for LC/MS/ MS. Microcolumn High-Performance Liquid Chromatography. HPLC grade solvents, methanol, acetonitrile, and acetic acid, were purchased from Fisher Scientific (Tustin, CA). Microcolumns were made by using the method of Kennedy and Jorgenson, employing 75 m i.d. fused-silica capillary tubing obtained from Polymicro Technologies (Phoenix, AZ).29 The columns were packed with Perseptive Biosystems (Framingham, MA) POROS 10 R2, a 10 m reversed-phase packing material, to a length of 10-15 cm. Samples were injected onto the column as previously described.30 Micro-HPLC was performed using Applied Biosystems (Foster City, CA) 140B dual-syringe pumps. The flow rate from the pumps was 100-150 L/min. The solvent stream was split precolumn to produce a final flow rate of 1-2 L/min. The mobile phase used for gradient elution consisted of (A) 0.5% acetic acid and (B) acetonitrile/water (80:20 v/v) containing 0.5% acetic acid. The gradient was linear from 0 to 40% B in 50 min, followed by 40-80% B in 10 min or 0-60 B% in 30 min. Electrospray Ionization Mass Spectrometry. Mass spectra were recorded on a TSQ700 (Finnigan MAT, San Jose, CA) equipped with an electrospray ionization source.16 Electrospray was performed by setting the needle voltage at 4.6 kV. The sheath gas was set at 22 units, and the auxiliary gas was set at 10 units.
(28) Drubin, D. G.; Miller, K. G.; Botstein, D. J. Cell Biol. 1988, 107, 25512561. (29) Kennedy R. T.; Jorgenson, J. W. Anal. Chem. 1989, 61, 1128-1135. (30) Yates, J. R., III; McCormack, A. L.; Hayden, J. B.; Davey, M. P. In Cell Biology: A Laboratory Handbook; Celis, J. E., Ed.; Academic Press: San Diego, CA, 1994; p 380.

Analytical Chemistry, Vol. 69, No. 4, February 15, 1997

769

Both used prepure nitrogen (99.999%). The sheath liquid consisted of a 70% methanol, 30% water, and 0.1% acetic acid mixture and was set to flow around the end of the column at a rate of 1-2 L/min. The capillary temperature was held at 150 C, with a potential of 24 V. Automated Tandem Mass Spectrometry. Tandem mass spectra were acquired using instrument control language (ICL) as described.18 The ICL program acquires two mass spectra, scanning Q3 over 500-1100 or 550-1200 m/z in 1 s. If an ion is present in the scan and the calculated ion abundance is above a specified value, then product ion spectra are acquired using the m/z value of the base peak in the last mass spectrum. Using the m/z value and the assumption that the parent is doubly charged, the program calculates the scan range and the collision offset. Product ion spectra were generated using argon as the collision gas (3-4 mTorr) and collision energies (laboratory frame) on the order of 10-30 eV. The collision offset varies in a linear manner with m/z value.18 The spectra were acquired by scanning Q3 over the specified range in 2 s. Database Searching. Amino acid sequence databases were searched directly with tandem mass spectra using the computer algorithm, SEQUEST, described previously.18,21,22 The OWL databases, versions 26.2 and 27.1 (121 842 entries), were obtained as ASCII text files in the FASTA format from the National Center for Biotechnology Information (NCBI) by anonymous ftp. The protein standard database was created by removing protein sequences from the OWL database. The S. cerevisiae sequence database (7499 entries) was obtained from the Stanford yeast sequencing project, which is a nonredundant database containing sequences from Swissprot, PIR, and GenBank that is updated every week (http://genome-stanford.edu/). The algorithm allows for proteolytic specificity but does not require the information. Sequences for potential contaminants such as human keratin and bovine trypsin are added to the database. A commericial version of SEQUEST is available from Finnigan MAT (San Jose, CA). RESULTS AND DISCUSSION Direct analysis and identification of proteins in a heterogeneous mixture would allow rapid analysis of multienzyme protein complexes, surveys of proteins localized to subcellular spaces, or screens to find interacting proteins in processes with minimal need for purification of individual components. We are developing methodology to quickly identify peptides and thereby protein origins using automated LC/MS/MS and database searching. The focus is to survey the proteins present in the mixture by acquiring as many tandem mass spectra for the peptides present as possible. These spectra are then used to directly search a protein or nucleotide database, using the algorithm SEQUEST. The tandem mass spectra match to protein sequences, thus indicating the proteins are present. Using a standard protein mixture, we examine the dynamic range and reproducibility of the method as a means to rapidly identify proteins. In addition, we developed techniques to use this approach with protein mixtures obtained by enrichment with three different biochemical procedures. A schematic illustrating the procedure is shown in Figure 1. Analysis of a Standard Protein Mixture. A well-characterized mixture of proteins was sought with features that would be consistent with mixtures of proteins obtained in biochemical experiments. Such a mixture would have proteins of widely different molecular weights in varying concentrations. A readily available protein mixture meeting these criteria is the SDS-PAGE
770 Analytical Chemistry, Vol. 69, No. 4, February 15, 1997

Figure 1. General diagram of the procedure employed to identify proteins contained in mixtures.

protein markers listed in Table 1. The concentration of each of the proteins in the sample is inversely proportional to its molecular mass, such that 2 pmol of total protein contains 120 and 580 fmol of bovine serum albumin (ALBU, 66 kDa) and R-lactalbumin (LCA, 14 kDa), respectively. The proteins in the mixture were Scarboxyamidated followed by trypsin digestion. The scan range of the instrument was either m/z 500-1100 or 550-1200, so considering singly, doubly, and triply charged peptides, the maximum molecular mass sampling range is 500-3600 Da. Proteolysis of bovine serum albumin should produce 52 peptides in the sampling mass range. Dynamic Range. We have defined dynamic range in this experiment as the concentration range over which proteins present in a mixture can be identified. In the synthetic protein mixture, proteolysis of the six proteins will produce roughly 150 peptides within the mass range sampled by the mass spectrometer. By chromatographically resolving the peptides as much as possible, there is a greater chance of acquiring peptides from each protein present. Theoretically, a maximum of 210 tandem mass spectra can be obtained during a 30 min linear gradient (one MS/ MS every 7 s), in contrast to the 420 that can be acquired during a 60 min gradient. In addition, a 30 min linear gradient is less likely to sufficiently resolve a complicated peptide mixture to allow tandem mass spectra to be obtained from all the proteins present in a mixture. A longer gradient should resolve more of the peptides and allow identification of the proteins present in the standard mixture. The longer 60 min gradient was used for all subsequent experiments. A comparison was performed on the efficacy of protein identification with decreasing levels of protein. Three standard protein samples were analyzed containing 16, 4, and 2 pmol of total protein. The least abundant protein in the mixture was present at levels of 928, 232, and 116 fmol in the 16, 4, and 2 pmol samples, respectively. R-Lactalbumin was present in the highest

Table 2. Affect of Protein Concentration on the Number of Peptides Identified from Each Protein in the Mixturea total protein protein ALBU-Bovin OVAL-Chick G3P1-Rabit CAH2-Bovin TRYP-Bovin ITRA-Soybn LCA-Bovin CAS1-Bovin 2 pmol 4 pmol 4 pmol (40 fmol of casein) 16 pmol 10 10 3 6 6 6 3 na 9 9 3 4 6 7 3 na 4 1 3 0 6 4 3 2 13 7 3 6 6 6 3 na

a Version 26.2 of the OWL database (121 842 entries) was used for the search. na, not applicable.

concentration of 4352, 1088, and 544 fmol in the 16, 4, and 2 pmol samples. Acquisition of tandem mass spectra during chromatographic analysis of the 2 pmol sample resulted in identifying an average of seven peptides per protein in the mixture. As shown in Table 2, at higher total protein levels, 4 and 16 pmol, the average number of peptides identified does not increase. At all three sample concentrations, the proteins present in the mixture were identified. To further investigate the lower amount of protein that can be present and still be identified, we added a 40 fmol aliquot of phosphorylated R-casein (23 000 Da) to a 4 pmol total protein mixture of the standard proteins. At this level, R-casein was present at 6 times less than the molar amount of BSA and 30fold less than the most abundant protein R-lactalbumin. The LC/ MS/MS trace is shown in Figure 2A, and a tandem mass spectrum for the R-casein peptide HQGLPQEVLNENLLR is shown in Figure 2B. At the 40 fmol level, the signal-to-noise ratio and the sequence information present in the tandem mass spectrum are very good. Tandem mass spectra for two other R-casein peptides, one of which is a phosphopeptide, were obtained during the LC/MS/ MS analysis. A second dilution of 12 fmol of R-casein was added to 4 pmol of the total protein mixture and analyzed (data not shown). No tandem mass spectra from R-casein were acquired with this level of material. Tentative identification of two R-casein peptides by observation of their m/z values in the analysis was possible. The two ions coeluted with other ions of greater abundance and consequently did not trigger acquisition of tandem mass spectra. Had the peptide ions been fully resolved in the separation, tandem mass spectra could have been obtained. Reproducibility. A series of three analyses using 4 pmol of total protein was analyzed to determine the reproducibility of the method. A single LC column was used, and a fresh aliquot of material was used for each analysis (Table 3). In two of the analyses, all seven proteins were identified, and in one analysis no tandem mass spectrum of peptides from carbonic anhydrase was obtained. Identification of carbonic anhydrase was the least predictable in all analyses for unknown reasons. A plausible explanation is that carbonic anhydrase may require more stringent denaturing conditions to promote unfolding prior to proteolysis than was used. Peptide Sampling. The large number of peptides present in the digested protein mixture creates a demanding separation problem. Many of the peptides coelute during the separation, preventing acquisition of the tandem mass spectra on all of the

Figure 2. (A) LC/MS/MS chromatogram from trypsin digestion of 4 pmol of the standard protein mixture containing the proteins shown in Table 1 plus 40 fmol of R-casein. Three R-casein peptides were identifiedsone is shown in the main body of the figure, and two others are shown in the inset. (B) Collision-induced dissociation mass spectrum recorded on the (M + 3H)3+ ions at m/z 587 of a peptide from the protein R-casein. Fragments of type b- and y-ions having the general formulas H(NHCHRCO)n+ and H2(NHCHRCO)nOH+, respectively, are shown above and below the amino acid sequence at the top of the figure.

peptides. This is also related to the sampling rate possible with the tandem mass spectrometer. If the number of MS/MS that can be obtained per unit time can be increased without decreasing the quality of the information in the spectrum, then spectra from more peptides could be obtained in a single chromatographic analysis. At present, we improve signal quality by averaging 4-5 scans. If that number is reduced, does signal quality diminish to
Analytical Chemistry, Vol. 69, No. 4, February 15, 1997
771

Table 3. Study of the Number of Tandem Mass Spectra Matched to Each Protein in the Mixture Using the Same Separation Conditions and Instrument Control Program To Acquire the Tandem Mass Spectraa protein ALBU-Bovin OVAL-Chick G3P1-Rabit CAH2-Bovin TRYP-Bovin ITRA-Soybn LCA-Bovin Ia 4 4 3 1 6 5 4 Ib 6 5 4 1 6 4 3 Ic 9 2 4 0 7 4 3

a Version 27.1 of the OWL database (142 737 entries) was used for the search.

a point where information and specificity in a tandem mass spectrum are lost? A study was performed to determine if the number of scans acquired for each m/z value can be reduced. Tandem mass spectra were then acquired using an ICL program to acquire one, two, and four scans during acquisition of the tandem mass spectra. A tandem mass spectrum is shown for the same peptide from each of the analyses in Figure 3A-C, respectively. Each spectrum was used to search the database, and all three were successful, with roughly equivalent scores. Thus, it is possible to reduce the number of scans and maintain the integrity of the database search. A potential difficulty for protein identification will result when only one tandem mass spectrum is obtained for a protein and poor signal-to-noise ratios or incomplete sets of fragment ions prevent unambiguous validation of the search result. By increasing the number of spectra obtained over the course of an analysis, the number of instances when only one tandem mass spectrum is obtained for a given protein should be decreased. Summary. Several points can be drawn from the above experiments. First, the ability to survey complex mixtures of proteins is improved by employing chromatography with long gradient times to maximize molecular separation. This increases the number of tandem mass spectra acquired during the analysis, increasing the probability that one or more spectra will be obtained from all the proteins present. Low-abundance proteins can be detected that are within a 30-fold molar difference of the most abundant protein present, but this will also be dependent on the complexity of the protein mixture as well as the resolving power of the HPLC separation. The presence of a protein in the mixture can be reliably determined on the basis of a single tandem mass spectrum if two basic criteria are followed. The tandem mass spectrum contains fragment ions for a majority of the sequence ions predicted by the sequence, and the tandem mass spectrum should match to only one protein from a specific organism and not match to a protein from a multigene family. SEQUEST monitors the number of times a specific sequence is matched and then allows for display of all the matched protein sequences. It can be quickly determined if the tandem mass spectrum has matched redundant entries of the same sequence or two separate sequences of a multigene family. A match to a sequence conserved in many protein sequences introduces ambiguity as to the presence of any one of the proteins without additional information, such as more tandem mass spectra from the same protein. This was illustrated by Eng et al. with peptides derived
772

from class II MHC molecules that were obtained from highly conserved regions of the proteins.21 Application to Protein Mixtures Derived from Biological Procedures. The studies on the standard protein mixture led to the development of applications of this method for direct analysis of proteins in mixtures. In particular, the applications are directed at procedures to quickly expand the set of proteins involved in specific processes or functions. Variations on these methods have been used by McCormack et al. for the identification of peptide sequences in the class II major histocompatibility antigen processing pathway20,31 and Link et al. to survey the identities of proteins in the Escherichia coli periplasmic space.32 Immunoprecipitation Reactions. A conjugated antibody with specific affinity for the RAS protein from S. cerevisiae was used to probe a cell lysate made from 108 cells. Shown in Figure 4 is a lane from a 12.5% gel in which 10 L (1/5) from a total volume of 50 L of the eluted material was loaded into the well. Two predominate bands are observed at approximately 38 kDa. This is consistent with the masses of RAS1 (36 kDa) and RAS2 (40 kDa). Since the antibody is reactive to both, it would follow that RAS1 and RAS2 would be affinity purified. Several other bands are present in the lane but are stained with less intensity than the two bands at 38 kDa. Silver staining is capable of detecting proteins at the low nanogram level. To identify the proteins present in this mixture, a 2 L aliquot (1/25) of digested material was added to 8 L of 0.5% acetic acid, and then the entire 10 L solution was analyzed by LC/ MS/MS. Tandem mass spectra were recorded and used to search the yeast protein database (7499 entries). Table 4 shows the peptide sequences and identities of six proteins that were found to be in the eluted material. Four peptides were found from the RAS2 protein, and two peptides were found derived from the RAS1 protein, which is consistent with the two bands observed at 36 and 38 kDa on the gel. Tandem mass spectra were acquired from peptides of four other proteins: human keratin (65 kDa), a yeast heat shock protein (81 kDa), yeast alcohol dehydrogenase (37 kDa), and yeast glyceraldehyde-3-phosphate (36 kDa). The molecular masses calculated for alcohol dehydrogenase, glyceraldehyde-3-phosphate, RAS1, and RAS2 are all in the range of 36-38 kDa, and only two prominent bands appear on the gel in this range. Presumably, these proteins are comigrating on the gel. Bands corresponding to proteins with molecular masses of 65 (human keratin) and 81 kDa (heat shock protein) also appear on the gel. Approximately 8.2% (2.8 kDa) of RAS1 and 20.8% (7.2 kDa) of the RAS2 protein were identified on the basis of tandem mass spectra. In other experiments, amino acid sequence corresponding to approximately 37.6% (13 kDa) has been identified for the RAS2 protein. Additional information can be obtained by trying to match m/z values present in the analysis to other predicted peptides from the proteins identified. By performing this analysis and tentatively matching only those m/z values that match a predicted sequence from one of the proteins identified, we were able to tentatively identify an additional 1.4 and 5.5 kDa more sequence from RAS1 and RAS2, respectively. The appearance of human keratin in this mixture can be attributed to contamination during the isolation procedures. The sequence for human keratin is added to the
(31) Yates, J. R., III; McCormack, A. L.; Link, A. J.; Schieltz, D.; Eng, J.; Hays, L. Analyst 1996, 121, 65R-76R. (32) Link, A.; Carmack, E.; Yates, J. R. Int. J. Mass Spectrom. Ion Processes, in press.

Analytical Chemistry, Vol. 69, No. 4, February 15, 1997

Figure 3. (A) LC/MS/MS trace from 4 pmol of trypsin-digested total standard protein containing the proteins listed in Table 1. One MS/MS scan was acquired for each ion above the threshold. (B) Collision-induced dissociation mass spectrum recorded from one scan of the (M + 2H)2+ ion at m/z 583. Fragments of type b- and y-ions having the general formulas H(NHCHRCO)n+ and H2(NHCHRCO)nOH+, respectively, are shown above and below the amino acid sequence at the top of the figure. This tandem mass spectrum was recorded from 4 pmol of the standard protein mixture. (C) Collision-induced dissociation mass spectrum recorded from two scans of the (M + 2H)2+ ion at m/z 583. (D) Collision-induced dissociation mass spectrum recorded from four scans of the (M + 2H)2+ ion at m/z 583.

Figure 4. Image of a silver-stained gel containing the proteins eluted from the anti-v-h-RAS antibody column.

database since this contaminat is frequently observed in experiments where humans are handling the sample. Affinity Interaction Chromatography. Proteins involved in enzymatic or biochemical processes frequently form multiprotein complexes, in that proteins involved in the complex interact with some level of specificity and binding energy.33 The specificity of the interactions has been used in procedures such as the yeast two-hybrid system to identify pairwise interactions among pro(33) Phizicky, E. M.; Fields, S. Microbiol. Rev. 1995, 59, 94-123.

teins.34,35 Coprecipitation experiments are also used to identify interacting proteins but require the generation of antibodies specific for each protein. Interactions among proteins that are specific and stable can be used to enrich for other participants in a process. To create a probe or bait to enrich interacting proteins, a protein known to be involved in a process can be fused with glutathione-Stransferase (GST). GST has a strong affinity for glutathione and thus can be readily bound to a solid support derivatized with glutathione. A gene fusion to GST was created with the S. cerevisiae gene SLA2. Sla2 is involved in regulation of the actin cytoskeleton, and mutants lacking Sla2 display a similar phenotype to that of actin mutants.36 An S. cerevisiae whole-cell lysate is passed over a preclearing column containing only GST bound to the solid support to reduce the quantity of proteins that may specifically interact with GST. The material passing through the
(34) Fields, S.; Sternglanz, R. Trends Genet. 1994, 10, 286-292. (35) Chien, C. T.; Bartel, P. L.; Sternglanz, R.; Fields, S. Proc. Natl. Acad. Sci. U.S.A. 1991, 88, 9578-9582. (36) Holtzman, D. A.; Yang S.; Drubin, D. G. J. Cell Biol. 1993, 122, 635-644.

Analytical Chemistry, Vol. 69, No. 4, February 15, 1997

773

Table 4. Summary of the Peptides and Proteins Identified in the Immunoaffinity Enrichment Using the Anti-v-h-RAS Antibody-Agarose Column protein identification RAS1 RAS2 mass of protein (Da) 34 306 34 554 peptide sequence QLNAPFLETSAK TGEGFLLVYSVTSR QVSYQDGLNMAK QMNAPFLETSAK SALTIQLTQSHFVDEYDPTIEDSYR QAINVEEAFYTLAR YEELQITAGR SISIVGSYVGNR CCSDVFNQVVK LPLVGGHEGAGVVVGMGENVK ATNGGAHGVINVSVSEAAIEASTR VVGLSTLPEIYEK SIGGEVFIDFTK SVDELTSLTDYVTR AVGKVLPELQGKLTGMAFRVP

K2C1 human keratin ADH1 alcohol dehydrogenase

65 886 36 692

HS82 heat shock protein G3P1 glyceraldehyde-3-phosphate

81 406 35 618

Table 5. Summary of Peptides and Proteins Identified That Bound to the GST-Sla2 Protein Fusion Column and Eluted from the Column in High-Salt Conditions (0.6 M KCl) protein NHPA non-histone chromosomal protein 6A PRE6 proteasome component G4P2 protein NFS1 nitrogen fixation-like protein MW 10 784 28 421 29 905 54 475 sequence SENPDITFGQVGK LTLEDPVTVEYLTR SLLEVVQTGAK EAQADAAAEIAEDAAEAEDAGKPK EIIFTSGATESNNMVLK EGFEVTFLNVDDQGLIDLK GLLSAEHTTLNGSPDHR LEPLLSGGGQER DVPEPITEFTSPPLDGLLLENIK AGNTGLATAFFNSENSNIVK SGAATLLVATAVAAR MADQLTDFLIMQNFR TGGFLFPVLSESFK AGGASAGGWGSSR TGPSPQPESQGSFYQR AYPTAVIMAPTR ELATQIFDEAK VLYVENQDK MLDMGFEPQIR DLMACAQTGSGK LESYVASIEQTVTDPVLSSK SQIEDVVLVGGSTR SSNITISNAVGR TFSPQEISAMVLTK ENTLLGEFDLK

DED1 putative ATP-dependent RNA helicase

65 534

SSB1 heat shock protein

66 452

preclearing column is then applied to the GST-Sla2 column and extensively washed to disrupt any nonspecific binding to the column. Specifically bound proteins are then eluted from the column in high salt (0.6 M KCl), dialyzed, and proteolytically digested with trypsin. The digestion products are analyzed by LC/MS/MS, and the tandem mass spectra are used to search the database. Shown in Table 5 are the identities of the proteins eluted from the column under the high-salt conditions. By combining the GST fusion method with LC/MS/MS, rapid identification of multiple participants in a process can be achieved, thereby providing new targets for molecular genetic studies of the process or pathway. The interactions were found to be consistent, as the same set of proteins appeared in repeated analyses. Of particular note is the presence of the NHP6A protein. This protein has sequence similarity to a mammalian non-histone chromosomal protein, HMG-1.37 Disruption of the gene for this
774 Analytical Chemistry, Vol. 69, No. 4, February 15, 1997

protein causes a cytoskeletal defect, so it may play some role in cytoskeletal organization in conjunction with Sla2. Microtubule Associated Proteins. A third method developed employs a large macromolecular complex to enrich for groups of proteins with specific affinity for the complex. Bovine tubulin is induced to form microtubules by the addition of paclitaxel and this macromolecular complex is used to enrich proteins in S. cerevisiae with a specific affinity for the complex. By varying the conditions of incubation, e.g., buffers, cofactors, etc., different sets of proteins can be enriched.38 A yeast wholecell lysate is then added to the macromolecular complex and washed with different salt conditions. The proteins that elute under the different conditions are collected, desalted, and con(37) Kolodrubetz, D.; Burgum, A. J. Biol. Chem. 1990, 265, 3234-3239. (38) Duncan, K.; Edwards, R. M.; Coggins, J. R. Biochem. J. 1987, 246, 37586.

Table 6. Summary of Proteins Identified Bound to Microtubules protein s22 40s ribosomal protein SSMla protein G4P2 protein suppressor protein L8300.8 hypothetical protein MW 14 608 24 467 29 905 33 699 sequence HGYIGEFEYIDDHR NFLETVELQVGLK EAQADAAAEIAEDAAEAEDAGKPK GNVGFVFTNEPLTEIK GFLSDLPDFEK SILDITDEELVSHF SLDPQYLVDDLRPEFAGY EIYDQYGLEAAR PVSLEDLFVGK FKEISEAFEIL VNLPVSLEDLFVGK NFNDPEVQGDMK NTISEAGDKLEQADKDTVTK ELQDIANPIMSK NQAAMNPSNTVFDAK LIDVDGKPQIQ SQVDEIVLVGGSTR LIDVDGKPQIQVEFK SLFEGIDF VNDAVVTVPAYFNDSQR IGIWDIPENYK TGNGSSGFLAEHSK WQNGDVPIAEK LPGKPEDNQDTNIF LPGKPEDNQDTNIFYS IEAIKDESLPVEIIK QQDLPSLPVPELK FIEAIKDESLPVEIIK SNDDQIPPLFKDPLFNYS EGPIFGEEMR YLPVNESFGFTGELR IVLHLPSPVTAQ SIEEIKEHVGVA SGPPVGTLKPLK QGGNFPDEEFK TTELPDGIQVH QYMDELTDAAK VGTAVNFEDNLR NALINNGVPEYVGHTA IEYLNNEGSLPIK VDLDELFEQQHNNQSVK LVDLDELFEQQHNNQSVK GTTVTVPNIGFESLQGDAR ELTLPTDIQYEVINK ILTDETLVYPFK IIGSHHDFQGLYSWDDAEWENR FGINVETSTTEPYTYYIPK VTNQLTNEIDEISNTDIEAM FILTDETLVYPFK KPLIESLPSEFNIIGIE WPGWWDVLHS PGWWDVLHSELGAK TWPGWWDVLHSELGAK LKPLIESLPSEFNIIGIE PILGNDIIHVGYN VPILGNDIIHVGY

YM9718 hypothetical protein heat shock protein

34 787 37 572

SSA2 heat shock protein SSA1 heat shock protein

69 452 69 749

L9576.2 protein YAT1 carnitine O-acetyltransferase

72 535 77 263

eEF-2 translation elongation factor

93 271

ARO1 pentafunctional arom polypeptide

174 736

centrated, and their identities are determined by LC/MS/MS and database searching. Shown in Table 6 are the proteins identified from a fraction of the proteins bound to the microtubules but released under wash conditions employing 0.5 M KCl solution. The mixture of proteins

was analyzed by gel electrophoresis with Coomassie staining and then proteolytically digested with trypsin. The proteins identified in the S. cerevisiae sequence database are shown in Table 6 and range in molecular mass from 14 to 174 kDa. A large number of tandem mass spectra were obtained for the 174 kDa arom
Analytical Chemistry, Vol. 69, No. 4, February 15, 1997
775

multifunctional enzyme.38 The ARO protein has appeared in other biochemical analyses of yeast microtubule binding proteins.39 In spite of the presence of this very large protein, other much smaller proteins were still identified. Also of biochemical interest was the presence of the G4P2, a suppressor protein, in both the GSTSla2 and the microtubule eluants. CONCLUSIONS The methodology described in this paper has been shown to provide reliable identifications of proteins present in heterogeneous mixtures. Direct analysis of the proteolytic products should present several advantages. Solution digestion of proteins produces more exhaustive digestion and is readily amenable to reduction and alkylation in the presence of chaotropic agents to assist in digestion. Sample handling is simplified. Automation of MS/MS data acquisition significantly improves the sampling rate, increasing the speed and throughput of data acquisition. An improvement in the ability to identify proteins present in more complex mixtures could be achieved by integrating highly resolving chromatographic or multidimensional chromatographic techniques to rapid scanning tandem mass spectrometers. In this manner, tandem mass spectrometry data could be obtained for many of the peptides present to allow more complete characterization of the proteins in the mixture.
(39) Barnes, G.; Louie, K. A.; Botstein, D. Mol. Biol. Cell. 1992, 3, 29-47.

In this article, we also demonstrated the use of direct LC/ MS/MS to identify proteins enriched through immunoaffinity interactions, through interaction with a GST protein fusion, and through interaction with a macromolecular complex. By using a known component of a process as bait, others that specifically interact, and perhaps are not known to be involved in that process, will be enriched. By utilizing LC/MS/MS and database searching, the identities of these components can be rapidly established. In organisms such as S. cerevisiae, where a set of molecular genetic tools exists, verification of a proteins involvement in a process can be accomplished by deleting the gene and determining phenotype or functional changes. Now that complete genome sequences are available for organisms, determining interactions among proteins involved in physiological processes will be of great importance. The approach described in this paper provides a rapid way to expand the knowledge of participants in physiological processes. ACKNOWLEDGMENT Support for this work was obtained from the National Institutes of Health (GM52095) and National Science Foundation, Science and Technology Center (BIR 9214821). Received for review August 6, 1996. Accepted November 26, 1996.X
AC960799Q
X

Abstract published in Advance ACS Abstracts, January 1, 1997.

776

Analytical Chemistry, Vol. 69, No. 4, February 15, 1997

You might also like