Welcome to pyproteome’s documentation!¶
pyproteome is a Python package for interacting with proteomics data.
It includes modules for loading, processing, and analyzing proteomics data collected by mass spectometry. This functionality allows users to automatically filter, normalize, and merge together data from proteome search files. It analysis toolkit includes the ability to cluster peptides that show correlated changes, and perform motif and pathway enrichment analysis to study cell signaling events.
Currently it only supports analyzing ProteomeDiscoverer .msf search files.
This package is designed to be used within an interactive computational environment, such as Jupyter Notebook, alongside other data analysis packages. This allows scientists to create their analysis workflow within a reproducable code environment.
Getting Started¶
To start, you will need a Python environment. Python version >= 3.6 is recommended. Windows users may wish to use Anaconda to manage their Python environment and provide pyproteome’s dependencies.
Install pyproteome:
$ pip install pyproteomeOpen Python and load your data sets, using
pyproteome.data_sets.load_all_data()
:from pyproteome import * # if using IPython: %import_all # Define sample:TMT channel mapping for each TMT-10plex analysis run1_chans = { 'KO 1': '126', 'KO 2': '127N', 'KO 3': '127C', 'KO 4': '128C', 'KO 5': '128C', 'Ctrl 1': '129N', 'Ctrl 2': '129C', 'Ctrl 3': '130N', 'Ctrl 4': '130C', 'Pooled Samples': 'Pooled Samples', } run2_chans = { 'KO 6': '126', 'KO 7': '127N', 'KO 8': '127C', 'KO 9': '128C', 'KO 10': '128C', 'Ctrl 5': '129N', 'Ctrl 6': '129C', 'Ctrl 7': '130N', 'Ctrl 8': '130C', 'Pooled Samples': 'Pooled Samples', } # This example demostrates loading and processing 6 different proteomics analyses # with 2 different TMT-10plex labeled samples, 3 different enrichment methods, and # 1 common pooled sample. # # ProteomeDiscoverer .msf search files are first stored in 'Searched/' as: # Run1_pY.msf # Run1_pSQTQ.msf # Run1_Sup.msf # Run2_pY.msf # Run2_pSQTQ.msf # Run2_Sup.msf datas = data_sets.load_all_data( # Assign run1 channels to all search files beginning with 'Run1_' # and run2 channels to all search files beginning with 'Run2_' chan_mapping={ 'Run1_': run1_chans, 'Run2_': run2_chans, }, # Apply CONSTANd normalization norm_mapping='constand', # Filter out peptides that do not pass the quality-control cutoffs filter_bad={ 'ion_score': 15, 'isolation': 30, 'median_quant': 1.5e3, 'q': 1e-2, } # Merge pY, pSQ/pTQ, and global supernatant runs together, # then merge Run1 and Run2 together, normalized against their # common channel merge_mapping={ 'Run1': ['Run1_pY', 'Run1_pSQTQ', 'Run1_Sup'], 'Run2': ['Run2_pY', 'Run2_pSQTQ', 'Run2_Sup'], 'Merged': ['Run1', 'Run2], } # Create a list for each comparison group groups={ 'KO': [ 'KO 1', 'KO 2', 'KO 3', 'KO 4', 'KO 5', 'KO 6', 'KO 7', 'KO 8', 'KO 9', 'KO 10', ], 'Control': [ 'Ctrl 1', 'Ctrl 2', 'Ctrl 3', 'Ctrl 4', 'Ctrl 5', 'Ctrl 6', 'Ctrl 7', 'Ctrl 8', ], 'Pooled': [ 'Pooled Samples', ], }, )Analyze your data, using
changes_table
,make_logo
,plot_volcano
, andpsea
:# Show a table listing significantly changing peptides display( tables.changes_table( # fold -> Average Group Fold Change (FC): FC > 1.5 or FC < 1/1.5 # p -> 2-sample t-test p-value between groups: p < 1e-2 datas['Merged'].filter(fold=1.25, p=1e-2), # Sort by fold change, otherwise sort by p-value by default sort='Fold Change', ) ) # Show phosphorylation motifs in upregulated set of peptides logo.make_logo( datas['Merged'], {'asym_fold': 1.5, 'p': 1e-2}, ) # Show volcano plot of peptides enriched in cluster 1 volcano.plot_volcano( datas['Merged'], fold=1.5, p=1e-3, ) # Perform Phospho Set Enrichment Analysis (PSEA) pathways.psea( datas['Merged'], min_hits=15, pval=True, metric='zscore', p=.75, p_iter=500, max_pval=1e-2, max_qval=.25, n_cpus=4, ) # Export the data set with fold changes as a .csv file tables.write_csv( datas['Merged'], out_name='Merged.csv', ) # Export all quantification data from each data set to an excel table tables.write_full_tables( datas, out_name='All Data.xlsx', )
Contents¶
- pyproteome
- pyproteome package
- Module contents
- Subpackages
- pyproteome.analysis package
- pyproteome.camv package
- pyproteome.cluster package
- pyproteome.data_sets package
- pyproteome.discoverer package
- pyproteome.motifs package
- pyproteome.pathways package
- pyproteome.pathways.enrichments module
- pyproteome.pathways.go module
- pyproteome.pathways.gskb module
- pyproteome.pathways.msigdb module
- pyproteome.pathways.pathwayscommon module
- pyproteome.pathways.photon_ptm module
- pyproteome.pathways.plot module
- pyproteome.pathways.plsr module
- pyproteome.pathways.psp module
- pyproteome.pathways.ptmsigdb module
- pyproteome.pathways.wikipathways module
- pyproteome.pride package
- pyproteome.pypuniprot package
- Submodules
- pyproteome.levels module
- pyproteome.loading module
- pyproteome.paths module
- pyproteome.species module
- pyproteome.utils module
- pyproteome.version module
- pyproteome package