pyproteome.pathways package¶

This module provides functionality for signal pathway analysis.

It includes functions for Gene Set Enrichment Analysis (GSEA) as well as Phospho Set Enrichment Analysis (PSEA).

pyproteome.pathways.psea(*args, **kwargs)[source]¶

Perform Gene Set Enrichment Analysis (GSEA) on a data set.

See pyproteome.pathways.gsea() for documentation and a full list of arguments.

Returns:	df : `pandas.DataFrame`, optional

pyproteome.pathways.gsea(psms=None, ds=None, gene_sets=None, metric=None, phenotype=None, species=None, min_hits=10, p_sites=False, remap=True, name=None, show_plots=True, **kwargs)[source]¶

Perform Gene Set Enrichment Analysis (GSEA) on a data set.

Parameters not listed below will be passed on to the underlying enrichments module. See pyproteome.pathways.enrichments.plot_gsea() for a full list of arguments.

Parameters:

psms : pandas.DataFrame, optional

ds : pyproteome.data_sets.DataSet, optional

The data set to perform enrichment analysis on.

phenotype : pandas.Series, optional

A series object with index values equal to the quantification columns in the data set. This object is used when calculating correlation statistics for each peptide.

gene_sets : pandas.DataFrame, optional

A dataframe with two columns: “name” and “set”.

Each element of set should be a Python set() object containing all the gene IDs for each gene set.

Gene IDs should be strings of Entrez Gene IDs for protein sets and strings of “<Entrez>,<letter><pos>-p” (i.e. “8778,Y544-p”) for phospho sets.

metric : str, optional

Correlation metric to use. One of [“zscore”, “fold”, “spearman”, “pearson”, “kendall”].

phenotype : pandas.Series, optional

species : str, optional

The species used to generate gene sets.

Value should be in binomial nomenclature (i.e. “Homo sapiens”, “Mus musculus”).

If different from that of the input data set, IDs will be mapped to the target species using Phosphosite Plus’s database.

min_hits : int, optional

p_sites : bool, optional

Perform Phospho Set Enrichment Analysis (PSEA) on data set.

remap : bool, optional

Remap database of phosphosites using information from all species.

name : str, optional

The name of this analysis. Defaults to ds.name.

show_plots : bool, optional

Returns:

vals : pandas.DataFrame
gene_changes : pandas.DataFrame

pyproteome.pathways.enrichments module¶

This module does most of the heavy lifting for the pathways module.

It includes functions for calculating enrichment scores and generating plots for GSEA / PSEA.

pyproteome.pathways.enrichments.CORRELATION_METRICS = ['spearman', 'pearson', 'kendall', 'fold', 'log2', 'zscore']¶

Correlation metrics used for enrichment analysis. ‘spearman’, ‘pearson’, and ‘kendall’ are all calculated using pandas.Series.corr().

‘fold’ takes ranking values direction from the ‘Fold Change’ column.

‘log2’ takes ranking values from a log2 ‘Fold Change’ column.

‘zscore’ takes ranking values from a log2 z-scored ‘Fold Change’ column.

pyproteome.pathways.enrichments.DEFAULT_CORR_CPUS = 4¶: Default number of CPUs to use when scrambling columns of a data set.

pyproteome.pathways.enrichments.DEFAULT_RANK_CPUS = 6¶: Default number of CPUs to use when scrambling rows of a data set.

pyproteome.pathways.enrichments.MIN_PERIODS = 5¶: Minimum number of samples with peptide quantification and phenotypic measurements needed to generate a correlation metric score.

class pyproteome.pathways.enrichments.PrPDF(data)[source]¶

Bases: object

An exact probability distribution estimator.

cdf(x)[source]¶

Cumulative density function.

Parameters:	x : float
Returns:	float

pdf(x)[source]¶

Probability density function.

Parameters:	x : float
Returns:	float

sf(x)[source]¶

Survival function.

Parameters:	x : float
Returns:	float

pyproteome.pathways.enrichments.calculate_es_s(gene_changes, gene_set, p=None, n_h=None, ess_method=None)[source]¶

Calculate the enrichment score for an individual gene set.

Parameters:	gene_changes : `pandas.DataFrame` gene_set : set of str p : float, optional n_h : int, optional ess_method : str, optional One of {‘integral’, ‘max_abs’, ‘max_min’}.
Returns:	dict

pyproteome.pathways.enrichments.calculate_es_s_ud(gene_changes, up_set, down_set, **kwargs)[source]¶

Calculate the enrichment score for an individual gene set.

Parameters:	gene_changes : `pandas.DataFrame` gene_set : set of str kwargs : dict, optional See extra arguments passed to calculate_es_s.
Returns:	dict

pyproteome.pathways.enrichments.correlate_phenotype(psms, phenotype=None, metric='spearman')[source]¶

Calculate the correlation values for each gene / phosphosite in a data set.

Parameters:	psms : `pandas.DataFrame` phenotype : `pandas.Series`, optional metric : str, optional The correlation function to use. See CORRELATION_METRICS for a full list of choices.
Returns:	psms : `pandas.DataFrame`

pyproteome.pathways.enrichments.enrichment_scores(psms, gene_sets, pval=True, recorrelate=False, metric=None, phenotype=None, **kwargs)[source]¶

Calculate enrichment scores for each gene set.

p-values and q-values are calculated by scrambling the phenotypes assigned to each sample or scrambling peptides’ fold changes, depending on the correlation metric used.

Parameters:	psms : `pandas.DataFrame` gene_sets : `pandas.DataFrame` pval : bool, optional recorrelate : bool, optional metric : str, optional phenotype : `pandas.Series`, optional kwargs : dict, optional See extra arguments passed to calculate_es_s and simulate_es_s_pi.
Returns:	df : `pandas.DataFrame`

pyproteome.pathways.enrichments.estimate_pq(vals)[source]¶

Estimate p- and q-values for an enrichment analysis using the ES(S, pi) values generated by simulate_es_s_pi.

Parameters:	vals : `pandas.DataFrame`

pyproteome.pathways.enrichments.filter_gene_sets(gene_sets, psms, min_hits=10)[source]¶

Filter gene sets to include only those with at least a given number of hits in a data set.

Parameters:	gene_sets : `pandas.DataFrame` psms : `pandas.DataFrame` min_hits : int, optional
Returns:	df : `pandas.DataFrame`

pyproteome.pathways.enrichments.filter_vals(vals, min_hits=0, min_abs_score=0, max_pval=1, max_qval=1)[source]¶

Filter gene set enrichment scores using give ES(S) / p-value / q-value cutoffs.

Parameters:	vals : `pandas.DataFrame` min_hits : int, optional min_abs_score : float, optional max_pval : float, optional max_qval : float, optional
Returns:	df : `pandas.DataFrame`

pyproteome.pathways.enrichments.get_gene_changes(psms)[source]¶

Extract the gene IDs and correlation values for each gene / phosphosite in a data set. Merge together duplicate IDs by calculating their mean correlation.

Parameters:	psms : `pandas.DataFrame`

pyproteome.pathways.enrichments.simulate_es_s_pi(vals, psms, gene_sets, phenotype=None, metric='spearman', p_iter=1000, n_cpus=None, **kwargs)[source]¶

Simulate ES(S, pi) by scrambling the phenotype / correlation values for a data set and recalculating gene set enrichment scores.

Parameters:	vals : `pandas.DataFrame` psms : `pandas.DataFrame` gene_sets : `pandas.DataFrame` phenotype : `pandas.Series`, optional metric : str, optional p_iter : int, optional n_cpus : int, optional
Returns:	df : `pandas.DataFrame`

pyproteome.pathways.go module¶

pyproteome.pathways.go.get_go_ids(go_ids, species='Homo sapiens', add_children=True)[source]¶

Fetch all gene symbols associated with a list of gene ontology term IDs.

Parameters:	go_ids : str or list of str species : str, optional add_children : bool, optional Include all child terms of input GO IDs.
Returns:	list of str

pyproteome.pathways.gskb module¶

pyproteome.pathways.gskb.get_gskb_pathways(species)[source]¶

Download gene sets from GSKB.

Parameters:	species : str
Returns:	df : `pandas.DataFrame`, optional

pyproteome.pathways.msigdb module¶

pyproteome.pathways.msigdb.get_msigdb_pathways(species, remap=None)[source]¶

Download gene sets from MSigDB. Currently downloads v7.0 of the gene signature repositories.

Parameters:	species : str
Returns:	df : `pandas.DataFrame`, optional

pyproteome.pathways.pathwayscommon module¶

pyproteome.pathways.pathwayscommon.get_pathway_common(species)[source]¶

Download gene sets from Pathway Commons.

Parameters:	species : str
Returns:	df : `pandas.DataFrame`, optional

pyproteome.pathways.photon_ptm module¶

pyproteome.pathways.photon_ptm.photon(ds, folder_name=None, write_output=False, log2=True)[source]¶

Run PHOTON algorithm on a data set to find functional phosphorylation sites using protein-protein interaction networks.

Parameters:	ds : `pyproteome.data_sets.DataSet` folder_name : str, optional log2 : bool, optional
Returns:	out_path : str Path to results directory.

pyproteome.pathways.plot module¶

pyproteome.pathways.plot.plot_correlations(gene_changes, ax=None)[source]¶

Plot the ranked list of correlations.

Parameters:	gene_changes : `pandas.DataFrame` Genes and their correlation values as calculated by get_gene_changes(). figsize : tuple of (int, int), optional
Returns:	f : `matplotlib.figure.Figure` ax : `matplotlib.axes.Axes`

pyproteome.pathways.plot.plot_enrichment(vals, cols=5)[source]¶

Plot enrichment score curves for each gene set.

Parameters:	vals : `pandas.DataFrame` The gene sets and scores calculated by enrichment_scores(). cols : int, optional

pyproteome.pathways.plot.plot_gsea(vals, gene_changes, min_hits=0, min_abs_score=0, max_pval=1, max_qval=1, name='', **kwargs)[source]¶

Run set enrichment analysis on a data set and generate all figures associated with that analysis.

Parameters:	vals : `pandas.DataFrame` gene_changes : `pandas.DataFrame`
Returns:	figs : list of `matplotlib.figure.Figure`

pyproteome.pathways.plot.plot_nes(vals, min_hits=0, min_abs_score=0, max_pval=0.05, max_qval=0.25, title=None, col=None, ax=None)[source]¶

Plot the ranked normalized enrichment score values.

Annotates significant gene sets with their name on the figure.

Parameters:	vals : `pandas.DataFrame` The gene sets and scores calculated by enrichment_scores(). min_hits : int, optional min_abs_score : float, optional max_pval : float, optional max_qval : float, optional figsize : tuple of (int, int), optional
Returns:	f : `matplotlib.figure.Figure` ax : `matplotlib.axes.Axes`

pyproteome.pathways.plot.plot_nes_dist(nes_vals, nes_pi_vals)[source]¶

Generate a histogram plot showing the distribution of NES(S) values alongside the distribution of NES(S, pi) values.

Parameters:	nes_vals : `numpy.array` nes_pi_vals : `numpy.array`
Returns:	f : `matplotlib.figure.Figure` ax : `matplotlib.axes.Axes`

pyproteome.pathways.plsr module¶

pyproteome.pathways.plsr.vip(model)[source]¶

Calculate VIP scores for a PLSR model.

Parameters:	model
Returns:	`numpy.array`

pyproteome.pathways.psp module¶

pyproteome.pathways.psp.get_phosphomap_data()[source]¶

Fetch mapping between phosphorylation sites of different species.

Returns:	df : `pandas.DataFrame`

pyproteome.pathways.psp.get_phosphoreg_data()[source]¶

Fetch Phosphosite Plus regulation data.

Returns:	df : `pandas.DataFrame`

pyproteome.pathways.psp.get_phosphosite(species, remap=False)[source]¶

Download phospho sets from PhophoSite Plus.

Parameters:	species : str remap : bool, optional
Returns:	df : `pandas.DataFrame`, optional

pyproteome.pathways.psp.get_phosphosite_regulation(species, remap=False)[source]¶

Download phospho sets from PhophoSite Plus.

Parameters:	species : str remap : bool, optional
Returns:	df : `pandas.DataFrame`, optional

pyproteome.pathways.ptmsigdb module¶

pyproteome.pathways.ptmsigdb.get_ptmsigdb(species)[source]¶

Download phospho sets for PTMSigDB.

Parameters:	species : str
Returns:	df : `pandas.DataFrame`

pyproteome.pathways.wikipathways module¶

pyproteome.pathways.wikipathways.get_wikipathways(species)[source]¶

Download gene sets from WikiPathways.

Parameters:	species : str
Returns:	df : `pandas.DataFrame`, optional

pyproteome.pathways.wikipathways.get_wikipathways_psites(species)[source]¶

Download phospho sets from WikiPathways.

Parameters:	species : str
Returns:	df : `pandas.DataFrame`, optional