Welcome to pyproteome’s documentation!¶

pyproteome is a Python package for interacting with proteomics data.

It includes modules for loading, processing, and analyzing proteomics data collected by mass spectometry. This functionality allows users to automatically filter, normalize, and merge together data from proteome search files. It analysis toolkit includes the ability to cluster peptides that show correlated changes, and perform motif and pathway enrichment analysis to study cell signaling events.

Currently it only supports analyzing ProteomeDiscoverer .msf search files.

This package is designed to be used within an interactive computational environment, such as Jupyter Notebook, alongside other data analysis packages. This allows scientists to create their analysis workflow within a reproducable code environment.

Getting Started¶

To start, you will need a Python environment. Python version >= 3.6 is recommended. Windows users may wish to use Anaconda to manage their Python environment and provide pyproteome’s dependencies.

Install pyproteome:
```
$ pip install pyproteome
```

Open Python and load your data sets, using pyproteome.data_sets.load_all_data():

from pyproteome import *
# if using IPython:
%import_all

# Define sample:TMT channel mapping for each TMT-10plex analysis
run1_chans = {
    'KO 1': '126',
    'KO 2': '127N',
    'KO 3': '127C',
    'KO 4': '128C',
    'KO 5': '128C',
    'Ctrl 1': '129N',
    'Ctrl 2': '129C',
    'Ctrl 3': '130N',
    'Ctrl 4': '130C',
    'Pooled Samples': 'Pooled Samples',
}
run2_chans = {
    'KO 6': '126',
    'KO 7': '127N',
    'KO 8': '127C',
    'KO 9': '128C',
    'KO 10': '128C',
    'Ctrl 5': '129N',
    'Ctrl 6': '129C',
    'Ctrl 7': '130N',
    'Ctrl 8': '130C',
    'Pooled Samples': 'Pooled Samples',
}

# This example demostrates loading and processing 6 different proteomics analyses
# with 2 different TMT-10plex labeled samples, 3 different enrichment methods, and
# 1 common pooled sample.
#
# ProteomeDiscoverer .msf search files are first stored in 'Searched/' as:
#     Run1_pY.msf
#     Run1_pSQTQ.msf
#     Run1_Sup.msf
#     Run2_pY.msf
#     Run2_pSQTQ.msf
#     Run2_Sup.msf
datas = data_sets.load_all_data(
    # Assign run1 channels to all search files beginning with 'Run1_'
    # and run2 channels to all search files beginning with 'Run2_'
    chan_mapping={
        'Run1_': run1_chans,
        'Run2_': run2_chans,
    },

    # Apply CONSTANd normalization
    norm_mapping='constand',

    # Filter out peptides that do not pass the quality-control cutoffs
    filter_bad={
        'ion_score': 15,
        'isolation': 30,
        'median_quant': 1.5e3,
        'q': 1e-2,
    }

    # Merge pY, pSQ/pTQ, and global supernatant runs together,
    # then merge Run1 and Run2 together, normalized against their
    # common channel
    merge_mapping={
        'Run1': ['Run1_pY', 'Run1_pSQTQ', 'Run1_Sup'],
        'Run2': ['Run2_pY', 'Run2_pSQTQ', 'Run2_Sup'],
        'Merged': ['Run1', 'Run2],
    }

    # Create a list for each comparison group
    groups={
        'KO': [
            'KO 1', 'KO 2', 'KO 3', 'KO 4', 'KO 5',
            'KO 6', 'KO 7', 'KO 8', 'KO 9', 'KO 10',
        ],
        'Control': [
            'Ctrl 1', 'Ctrl 2', 'Ctrl 3', 'Ctrl 4',
            'Ctrl 5', 'Ctrl 6', 'Ctrl 7', 'Ctrl 8',
        ],
        'Pooled': [
            'Pooled Samples',
        ],
    },
)

Analyze your data, using changes_table, make_logo, plot_volcano, and psea:

# Show a table listing significantly changing peptides
display(
    tables.changes_table(
        # fold -> Average Group Fold Change (FC): FC > 1.5 or FC < 1/1.5
        # p -> 2-sample t-test p-value between groups: p < 1e-2
        datas['Merged'].filter(fold=1.25, p=1e-2),

        # Sort by fold change, otherwise sort by p-value by default
        sort='Fold Change',
    )
)

# Show phosphorylation motifs in upregulated set of peptides
logo.make_logo(
    datas['Merged'],
    {'asym_fold': 1.5, 'p': 1e-2},
)

# Show volcano plot of peptides enriched in cluster 1
volcano.plot_volcano(
    datas['Merged'],
    fold=1.5,
    p=1e-3,
)

# Perform Phospho Set Enrichment Analysis (PSEA)
pathways.psea(
    datas['Merged'],
    min_hits=15,
    pval=True,
    metric='zscore',
    p=.75,
    p_iter=500,
    max_pval=1e-2,
    max_qval=.25,
    n_cpus=4,
)

# Export the data set with fold changes as a .csv file
tables.write_csv(
  datas['Merged'],
  out_name='Merged.csv',
)

# Export all quantification data from each data set to an excel table
tables.write_full_tables(
  datas,
  out_name='All Data.xlsx',
)

Contents¶

pyproteome
- pyproteome package

Welcome to pyproteome’s documentation!¶

Getting Started¶

Contents¶

Indices and tables¶

Table of Contents

Next topic

This Page