pyproteome.pathways package

This module provides functionality for signal pathway analysis.

It includes functions for Gene Set Enrichment Analysis (GSEA) as well as Phospho Set Enrichment Analysis (PSEA).

pyproteome.pathways.psea(*args, **kwargs)[source]

Perform Gene Set Enrichment Analysis (GSEA) on a data set.

See pyproteome.pathways.gsea() for documentation and a full list of arguments.

Returns:
df : pandas.DataFrame, optional
pyproteome.pathways.gsea(psms=None, ds=None, gene_sets=None, metric=None, phenotype=None, species=None, min_hits=10, p_sites=False, remap=True, name=None, show_plots=True, **kwargs)[source]

Perform Gene Set Enrichment Analysis (GSEA) on a data set.

Parameters not listed below will be passed on to the underlying enrichments module. See pyproteome.pathways.enrichments.plot_gsea() for a full list of arguments.

Parameters:
psms : pandas.DataFrame, optional
ds : pyproteome.data_sets.DataSet, optional

The data set to perform enrichment analysis on.

phenotype : pandas.Series, optional

A series object with index values equal to the quantification columns in the data set. This object is used when calculating correlation statistics for each peptide.

gene_sets : pandas.DataFrame, optional

A dataframe with two columns: “name” and “set”.

Each element of set should be a Python set() object containing all the gene IDs for each gene set.

Gene IDs should be strings of Entrez Gene IDs for protein sets and strings of “<Entrez>,<letter><pos>-p” (i.e. “8778,Y544-p”) for phospho sets.

metric : str, optional

Correlation metric to use. One of [“zscore”, “fold”, “spearman”, “pearson”, “kendall”].

phenotype : pandas.Series, optional
species : str, optional

The species used to generate gene sets.

Value should be in binomial nomenclature (i.e. “Homo sapiens”, “Mus musculus”).

If different from that of the input data set, IDs will be mapped to the target species using Phosphosite Plus’s database.

min_hits : int, optional
p_sites : bool, optional

Perform Phospho Set Enrichment Analysis (PSEA) on data set.

remap : bool, optional

Remap database of phosphosites using information from all species.

name : str, optional

The name of this analysis. Defaults to ds.name.

show_plots : bool, optional
Returns:
vals : pandas.DataFrame
gene_changes : pandas.DataFrame

See also

pyproteome.pathways.enrichments.enrichment_scores()
pyproteome.pathways.enrichments.plot_gsea()
pyproteome.pathways.ssgsea(ds=None, thres_na=None, *args, **kwargs)[source]
pyproteome.pathways.sspsea(*args, **kwargs)[source]
pyproteome.pathways.filter_fn(vals, ds=None, species=None)[source]
pyproteome.pathways.get_pathways(species, p_sites=False, remap=False)[source]

Download all default gene sets and phospho sets.

Parameters:
species : str

Target species to use to generate gene / phospho sets.

p_sites : bool, optional

Build phospho sets if true else build gene sets.

remap : bool, optional

Remap proteins / phosphosites from all species to the target species.

Returns:
df : pandas.DataFrame

pyproteome.pathways.enrichments module

This module does most of the heavy lifting for the pathways module.

It includes functions for calculating enrichment scores and generating plots for GSEA / PSEA.

pyproteome.pathways.enrichments.CORRELATION_METRICS = ['spearman', 'pearson', 'kendall', 'fold', 'log2', 'zscore']

Correlation metrics used for enrichment analysis. ‘spearman’, ‘pearson’, and ‘kendall’ are all calculated using pandas.Series.corr().

‘fold’ takes ranking values direction from the ‘Fold Change’ column.

‘log2’ takes ranking values from a log2 ‘Fold Change’ column.

‘zscore’ takes ranking values from a log2 z-scored ‘Fold Change’ column.

pyproteome.pathways.enrichments.DEFAULT_CORR_CPUS = 4

Default number of CPUs to use when scrambling columns of a data set.

pyproteome.pathways.enrichments.DEFAULT_RANK_CPUS = 6

Default number of CPUs to use when scrambling rows of a data set.

pyproteome.pathways.enrichments.MIN_PERIODS = 5

Minimum number of samples with peptide quantification and phenotypic measurements needed to generate a correlation metric score.

class pyproteome.pathways.enrichments.PrPDF(data)[source]

Bases: object

An exact probability distribution estimator.

cdf(x)[source]

Cumulative density function.

Parameters:
x : float
Returns:
float
pdf(x)[source]

Probability density function.

Parameters:
x : float
Returns:
float
sf(x)[source]

Survival function.

Parameters:
x : float
Returns:
float
pyproteome.pathways.enrichments.calculate_es_s(gene_changes, gene_set, p=None, n_h=None, ess_method=None)[source]

Calculate the enrichment score for an individual gene set.

Parameters:
gene_changes : pandas.DataFrame
gene_set : set of str
p : float, optional
n_h : int, optional
ess_method : str, optional

One of {‘integral’, ‘max_abs’, ‘max_min’}.

Returns:
dict
pyproteome.pathways.enrichments.calculate_es_s_ud(gene_changes, up_set, down_set, **kwargs)[source]

Calculate the enrichment score for an individual gene set.

Parameters:
gene_changes : pandas.DataFrame
gene_set : set of str
kwargs : dict, optional

See extra arguments passed to calculate_es_s.

Returns:
dict
pyproteome.pathways.enrichments.correlate_phenotype(psms, phenotype=None, metric='spearman')[source]

Calculate the correlation values for each gene / phosphosite in a data set.

Parameters:
psms : pandas.DataFrame
phenotype : pandas.Series, optional
metric : str, optional

The correlation function to use. See CORRELATION_METRICS for a full list of choices.

Returns:
psms : pandas.DataFrame
pyproteome.pathways.enrichments.enrichment_scores(psms, gene_sets, pval=True, recorrelate=False, metric=None, phenotype=None, **kwargs)[source]

Calculate enrichment scores for each gene set.

p-values and q-values are calculated by scrambling the phenotypes assigned to each sample or scrambling peptides’ fold changes, depending on the correlation metric used.

Parameters:
psms : pandas.DataFrame
gene_sets : pandas.DataFrame
pval : bool, optional
recorrelate : bool, optional
metric : str, optional
phenotype : pandas.Series, optional
kwargs : dict, optional

See extra arguments passed to calculate_es_s and simulate_es_s_pi.

Returns:
df : pandas.DataFrame
pyproteome.pathways.enrichments.estimate_pq(vals)[source]

Estimate p- and q-values for an enrichment analysis using the ES(S, pi) values generated by simulate_es_s_pi.

Parameters:
vals : pandas.DataFrame
pyproteome.pathways.enrichments.filter_gene_sets(gene_sets, psms, min_hits=10)[source]

Filter gene sets to include only those with at least a given number of hits in a data set.

Parameters:
gene_sets : pandas.DataFrame
psms : pandas.DataFrame
min_hits : int, optional
Returns:
df : pandas.DataFrame
pyproteome.pathways.enrichments.filter_vals(vals, min_hits=0, min_abs_score=0, max_pval=1, max_qval=1)[source]

Filter gene set enrichment scores using give ES(S) / p-value / q-value cutoffs.

Parameters:
vals : pandas.DataFrame
min_hits : int, optional
min_abs_score : float, optional
max_pval : float, optional
max_qval : float, optional
Returns:
df : pandas.DataFrame
pyproteome.pathways.enrichments.get_gene_changes(psms)[source]

Extract the gene IDs and correlation values for each gene / phosphosite in a data set. Merge together duplicate IDs by calculating their mean correlation.

Parameters:
psms : pandas.DataFrame
pyproteome.pathways.enrichments.simulate_es_s_pi(vals, psms, gene_sets, phenotype=None, metric='spearman', p_iter=1000, n_cpus=None, **kwargs)[source]

Simulate ES(S, pi) by scrambling the phenotype / correlation values for a data set and recalculating gene set enrichment scores.

Parameters:
vals : pandas.DataFrame
psms : pandas.DataFrame
gene_sets : pandas.DataFrame
phenotype : pandas.Series, optional
metric : str, optional
p_iter : int, optional
n_cpus : int, optional
Returns:
df : pandas.DataFrame

pyproteome.pathways.go module

pyproteome.pathways.go.get_go_ids(go_ids, species='Homo sapiens', add_children=True)[source]

Fetch all gene symbols associated with a list of gene ontology term IDs.

Parameters:
go_ids : str or list of str
species : str, optional
add_children : bool, optional

Include all child terms of input GO IDs.

Returns:
list of str

pyproteome.pathways.gskb module

pyproteome.pathways.gskb.get_gskb_pathways(species)[source]

Download gene sets from GSKB.

Parameters:
species : str
Returns:
df : pandas.DataFrame, optional

pyproteome.pathways.msigdb module

pyproteome.pathways.msigdb.get_msigdb_pathways(species, remap=None)[source]

Download gene sets from MSigDB. Currently downloads v7.0 of the gene signature repositories.

Parameters:
species : str
Returns:
df : pandas.DataFrame, optional

pyproteome.pathways.pathwayscommon module

pyproteome.pathways.pathwayscommon.get_pathway_common(species)[source]

Download gene sets from Pathway Commons.

Parameters:
species : str
Returns:
df : pandas.DataFrame, optional

pyproteome.pathways.photon_ptm module

pyproteome.pathways.photon_ptm.photon(ds, folder_name=None, write_output=False, log2=True)[source]

Run PHOTON algorithm on a data set to find functional phosphorylation sites using protein-protein interaction networks.

Parameters:
ds : pyproteome.data_sets.DataSet
folder_name : str, optional
log2 : bool, optional
Returns:
out_path : str

Path to results directory.

pyproteome.pathways.plot module

pyproteome.pathways.plot.plot_correlations(gene_changes, ax=None)[source]

Plot the ranked list of correlations.

Parameters:
gene_changes : pandas.DataFrame

Genes and their correlation values as calculated by get_gene_changes().

figsize : tuple of (int, int), optional
Returns:
f : matplotlib.figure.Figure
ax : matplotlib.axes.Axes
pyproteome.pathways.plot.plot_enrichment(vals, cols=5)[source]

Plot enrichment score curves for each gene set.

Parameters:
vals : pandas.DataFrame

The gene sets and scores calculated by enrichment_scores().

cols : int, optional
pyproteome.pathways.plot.plot_gsea(vals, gene_changes, min_hits=0, min_abs_score=0, max_pval=1, max_qval=1, name='', **kwargs)[source]

Run set enrichment analysis on a data set and generate all figures associated with that analysis.

Parameters:
vals : pandas.DataFrame
gene_changes : pandas.DataFrame
Returns:
figs : list of matplotlib.figure.Figure
pyproteome.pathways.plot.plot_nes(vals, min_hits=0, min_abs_score=0, max_pval=0.05, max_qval=0.25, title=None, col=None, ax=None)[source]

Plot the ranked normalized enrichment score values.

Annotates significant gene sets with their name on the figure.

Parameters:
vals : pandas.DataFrame

The gene sets and scores calculated by enrichment_scores().

min_hits : int, optional
min_abs_score : float, optional
max_pval : float, optional
max_qval : float, optional
figsize : tuple of (int, int), optional
Returns:
f : matplotlib.figure.Figure
ax : matplotlib.axes.Axes
pyproteome.pathways.plot.plot_nes_dist(nes_vals, nes_pi_vals)[source]

Generate a histogram plot showing the distribution of NES(S) values alongside the distribution of NES(S, pi) values.

Parameters:
nes_vals : numpy.array
nes_pi_vals : numpy.array
Returns:
f : matplotlib.figure.Figure
ax : matplotlib.axes.Axes

pyproteome.pathways.plsr module

pyproteome.pathways.plsr.vip(model)[source]

Calculate VIP scores for a PLSR model.

Parameters:
model
Returns:
numpy.array

pyproteome.pathways.psp module

pyproteome.pathways.psp.get_phosphomap_data()[source]

Fetch mapping between phosphorylation sites of different species.

Returns:
df : pandas.DataFrame
pyproteome.pathways.psp.get_phosphoreg_data()[source]

Fetch Phosphosite Plus regulation data.

Returns:
df : pandas.DataFrame
pyproteome.pathways.psp.get_phosphosite(species, remap=False)[source]

Download phospho sets from PhophoSite Plus.

Parameters:
species : str
remap : bool, optional
Returns:
df : pandas.DataFrame, optional
pyproteome.pathways.psp.get_phosphosite_regulation(species, remap=False)[source]

Download phospho sets from PhophoSite Plus.

Parameters:
species : str
remap : bool, optional
Returns:
df : pandas.DataFrame, optional

pyproteome.pathways.ptmsigdb module

pyproteome.pathways.ptmsigdb.get_ptmsigdb(species)[source]

Download phospho sets for PTMSigDB.

Parameters:
species : str
Returns:
df : pandas.DataFrame

pyproteome.pathways.wikipathways module

pyproteome.pathways.wikipathways.get_wikipathways(species)[source]

Download gene sets from WikiPathways.

Parameters:
species : str
Returns:
df : pandas.DataFrame, optional
pyproteome.pathways.wikipathways.get_wikipathways_psites(species)[source]

Download phospho sets from WikiPathways.

Parameters:
species : str
Returns:
df : pandas.DataFrame, optional