.. pyproteome documentation master file, created by sphinx-quickstart on Fri Jan 22 11:16:19 2016. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to pyproteome's documentation! ====================================== pyproteome is a Python package for interacting with proteomics data. It includes modules for loading, processing, and analyzing proteomics data collected by mass spectometry. This functionality allows users to automatically filter, normalize, and merge together data from proteome search files. It analysis toolkit includes the ability to cluster peptides that show correlated changes, and perform motif and pathway enrichment analysis to study cell signaling events. Currently it only supports analyzing ProteomeDiscoverer `.msf` search files. This package is designed to be used within an interactive computational environment, such as Jupyter Notebook, alongside other data analysis packages. This allows scientists to create their analysis workflow within a reproducable code environment. Getting Started =============== To start, you will need a Python environment. Python version >= 3.6 is recommended. Windows users may wish to use `Anaconda `_ to manage their Python environment and provide pyproteome's dependencies. 1. Install pyproteome:: $ pip install pyproteome 2. Open Python and load your data sets, using :func:`pyproteome.data_sets.load_all_data`:: from pyproteome import * # if using IPython: %import_all # Define sample:TMT channel mapping for each TMT-10plex analysis run1_chans = { 'KO 1': '126', 'KO 2': '127N', 'KO 3': '127C', 'KO 4': '128C', 'KO 5': '128C', 'Ctrl 1': '129N', 'Ctrl 2': '129C', 'Ctrl 3': '130N', 'Ctrl 4': '130C', 'Pooled Samples': 'Pooled Samples', } run2_chans = { 'KO 6': '126', 'KO 7': '127N', 'KO 8': '127C', 'KO 9': '128C', 'KO 10': '128C', 'Ctrl 5': '129N', 'Ctrl 6': '129C', 'Ctrl 7': '130N', 'Ctrl 8': '130C', 'Pooled Samples': 'Pooled Samples', } # This example demostrates loading and processing 6 different proteomics analyses # with 2 different TMT-10plex labeled samples, 3 different enrichment methods, and # 1 common pooled sample. # # ProteomeDiscoverer .msf search files are first stored in 'Searched/' as: # Run1_pY.msf # Run1_pSQTQ.msf # Run1_Sup.msf # Run2_pY.msf # Run2_pSQTQ.msf # Run2_Sup.msf datas = data_sets.load_all_data( # Assign run1 channels to all search files beginning with 'Run1_' # and run2 channels to all search files beginning with 'Run2_' chan_mapping={ 'Run1_': run1_chans, 'Run2_': run2_chans, }, # Apply CONSTANd normalization norm_mapping='constand', # Filter out peptides that do not pass the quality-control cutoffs filter_bad={ 'ion_score': 15, 'isolation': 30, 'median_quant': 1.5e3, 'q': 1e-2, } # Merge pY, pSQ/pTQ, and global supernatant runs together, # then merge Run1 and Run2 together, normalized against their # common channel merge_mapping={ 'Run1': ['Run1_pY', 'Run1_pSQTQ', 'Run1_Sup'], 'Run2': ['Run2_pY', 'Run2_pSQTQ', 'Run2_Sup'], 'Merged': ['Run1', 'Run2], } # Create a list for each comparison group groups={ 'KO': [ 'KO 1', 'KO 2', 'KO 3', 'KO 4', 'KO 5', 'KO 6', 'KO 7', 'KO 8', 'KO 9', 'KO 10', ], 'Control': [ 'Ctrl 1', 'Ctrl 2', 'Ctrl 3', 'Ctrl 4', 'Ctrl 5', 'Ctrl 6', 'Ctrl 7', 'Ctrl 8', ], 'Pooled': [ 'Pooled Samples', ], }, ) 3. Analyze your data, using :func:`changes_table`, :func:`make_logo`, :func:`plot_volcano`, and :func:`psea`:: # Show a table listing significantly changing peptides display( tables.changes_table( # fold -> Average Group Fold Change (FC): FC > 1.5 or FC < 1/1.5 # p -> 2-sample t-test p-value between groups: p < 1e-2 datas['Merged'].filter(fold=1.25, p=1e-2), # Sort by fold change, otherwise sort by p-value by default sort='Fold Change', ) ) # Show phosphorylation motifs in upregulated set of peptides logo.make_logo( datas['Merged'], {'asym_fold': 1.5, 'p': 1e-2}, ) # Show volcano plot of peptides enriched in cluster 1 volcano.plot_volcano( datas['Merged'], fold=1.5, p=1e-3, ) # Perform Phospho Set Enrichment Analysis (PSEA) pathways.psea( datas['Merged'], min_hits=15, pval=True, metric='zscore', p=.75, p_iter=500, max_pval=1e-2, max_qval=.25, n_cpus=4, ) # Export the data set with fold changes as a .csv file tables.write_csv( datas['Merged'], out_name='Merged.csv', ) # Export all quantification data from each data set to an excel table tables.write_full_tables( datas, out_name='All Data.xlsx', ) Contents ======== .. toctree:: :maxdepth: 5 modules Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`