Reanalysis of Global Proteomic and Phosphoproteomic Data Identified a Large Number of Glycopeptides.
Abstract
Protein glycosylation plays fundamental roles in many cellular processes, and previous reports have shown dysregulation to be associated with several human diseases, including diabetes, cancer, and neurodegenerative disorders. Despite the vital role of glycosylation for proper protein function, the analysis of glycoproteins has been lagged behind to other protein modifications. In this study, we describe the reanalysis of global proteomic data from breast cancer xenograft tissues using recently developed software package GPQuest 2.0, revealing a large number of previously unidentified N-linked glycopeptides. More importantly, we found that using immobilized metal affinity chromatography (IMAC) technology for the enrichment of phosphopeptides had coenriched a substantial number of sialoglycopeptides, allowing for a large-scale analysis of sialoglycopeptides in conjunction with the analysis of phosphopeptides. Collectively, combined tandem mass spectrometry (MS/MS) analyses of global proteomic and phosphoproteomic data sets resulted in the identification of 6 724 N-linked glycopeptides from 617 glycoproteins derived from two breast cancer xenograft tissues. Next, we utilized GPQuest 2.0 for the reanalysis of global and phosphoproteomic data generated from 108 human breast cancer tissues that were previously analyzed by Clinical Proteomic Analysis Consortium (CPTAC). Reanalysis of the CPTAC data set resulted in the identification of 2 683 glycopeptides from the global proteomic data set and 4 554 glycopeptides from phosphoproteomic data set, respectively. Together, 11 292 N-linked glycopeptides corresponding to 1 731 N-linked glycosites from 883 human glycoproteins were identified from the two data sets. This analysis revealed an extensive number of glycopeptides hidden in the global and enriched in IMAC-based phosphopeptide-enriched proteomic data, information which would have remained unknown from the original study otherwise. The reanalysis described herein can be readily applied to identify glycopeptides from already existing data sets, providing insight into many important facets of protein glycosylation in different biological, physiological, and pathological processes.