pyproteome.pathways package¶
This module provides functionality for signal pathway analysis.
It includes functions for Gene Set Enrichment Analysis (GSEA) as well as Phospho Set Enrichment Analysis (PSEA).
-
pyproteome.pathways.
psea
(*args, **kwargs)[source]¶ Perform Gene Set Enrichment Analysis (GSEA) on a data set.
See
pyproteome.pathways.gsea()
for documentation and a full list of arguments.Returns: - df :
pandas.DataFrame
, optional
- df :
-
pyproteome.pathways.
gsea
(psms=None, ds=None, gene_sets=None, metric=None, phenotype=None, species=None, min_hits=10, p_sites=False, remap=True, name=None, show_plots=True, **kwargs)[source]¶ Perform Gene Set Enrichment Analysis (GSEA) on a data set.
Parameters not listed below will be passed on to the underlying enrichments module. See
pyproteome.pathways.enrichments.plot_gsea()
for a full list of arguments.Parameters: - psms :
pandas.DataFrame
, optional - ds :
pyproteome.data_sets.DataSet
, optional The data set to perform enrichment analysis on.
- phenotype :
pandas.Series
, optional A series object with index values equal to the quantification columns in the data set. This object is used when calculating correlation statistics for each peptide.
- gene_sets :
pandas.DataFrame
, optional A dataframe with two columns: “name” and “set”.
Each element of set should be a Python set() object containing all the gene IDs for each gene set.
Gene IDs should be strings of Entrez Gene IDs for protein sets and strings of “<Entrez>,<letter><pos>-p” (i.e. “8778,Y544-p”) for phospho sets.
- metric : str, optional
Correlation metric to use. One of [“zscore”, “fold”, “spearman”, “pearson”, “kendall”].
- phenotype :
pandas.Series
, optional - species : str, optional
The species used to generate gene sets.
Value should be in binomial nomenclature (i.e. “Homo sapiens”, “Mus musculus”).
If different from that of the input data set, IDs will be mapped to the target species using Phosphosite Plus’s database.
- min_hits : int, optional
- p_sites : bool, optional
Perform Phospho Set Enrichment Analysis (PSEA) on data set.
- remap : bool, optional
Remap database of phosphosites using information from all species.
- name : str, optional
The name of this analysis. Defaults to ds.name.
- show_plots : bool, optional
Returns: - vals :
pandas.DataFrame
- gene_changes :
pandas.DataFrame
See also
pyproteome.pathways.enrichments.enrichment_scores()
pyproteome.pathways.enrichments.plot_gsea()
- psms :
-
pyproteome.pathways.
get_pathways
(species, p_sites=False, remap=False)[source]¶ Download all default gene sets and phospho sets.
Parameters: - species : str
Target species to use to generate gene / phospho sets.
- p_sites : bool, optional
Build phospho sets if true else build gene sets.
- remap : bool, optional
Remap proteins / phosphosites from all species to the target species.
Returns: - df :
pandas.DataFrame
pyproteome.pathways.enrichments module¶
This module does most of the heavy lifting for the pathways module.
It includes functions for calculating enrichment scores and generating plots for GSEA / PSEA.
-
pyproteome.pathways.enrichments.
CORRELATION_METRICS
= ['spearman', 'pearson', 'kendall', 'fold', 'log2', 'zscore']¶ Correlation metrics used for enrichment analysis. ‘spearman’, ‘pearson’, and ‘kendall’ are all calculated using pandas.Series.corr().
‘fold’ takes ranking values direction from the ‘Fold Change’ column.
‘log2’ takes ranking values from a log2 ‘Fold Change’ column.
‘zscore’ takes ranking values from a log2 z-scored ‘Fold Change’ column.
-
pyproteome.pathways.enrichments.
DEFAULT_CORR_CPUS
= 4¶ Default number of CPUs to use when scrambling columns of a data set.
-
pyproteome.pathways.enrichments.
DEFAULT_RANK_CPUS
= 6¶ Default number of CPUs to use when scrambling rows of a data set.
-
pyproteome.pathways.enrichments.
MIN_PERIODS
= 5¶ Minimum number of samples with peptide quantification and phenotypic measurements needed to generate a correlation metric score.
-
class
pyproteome.pathways.enrichments.
PrPDF
(data)[source]¶ Bases:
object
An exact probability distribution estimator.
-
pyproteome.pathways.enrichments.
calculate_es_s
(gene_changes, gene_set, p=None, n_h=None, ess_method=None)[source]¶ Calculate the enrichment score for an individual gene set.
Parameters: - gene_changes :
pandas.DataFrame
- gene_set : set of str
- p : float, optional
- n_h : int, optional
- ess_method : str, optional
One of {‘integral’, ‘max_abs’, ‘max_min’}.
Returns: - dict
- gene_changes :
-
pyproteome.pathways.enrichments.
calculate_es_s_ud
(gene_changes, up_set, down_set, **kwargs)[source]¶ Calculate the enrichment score for an individual gene set.
Parameters: - gene_changes :
pandas.DataFrame
- gene_set : set of str
- kwargs : dict, optional
See extra arguments passed to calculate_es_s.
Returns: - dict
- gene_changes :
-
pyproteome.pathways.enrichments.
correlate_phenotype
(psms, phenotype=None, metric='spearman')[source]¶ Calculate the correlation values for each gene / phosphosite in a data set.
Parameters: - psms :
pandas.DataFrame
- phenotype :
pandas.Series
, optional - metric : str, optional
The correlation function to use. See CORRELATION_METRICS for a full list of choices.
Returns: - psms :
pandas.DataFrame
- psms :
-
pyproteome.pathways.enrichments.
enrichment_scores
(psms, gene_sets, pval=True, recorrelate=False, metric=None, phenotype=None, **kwargs)[source]¶ Calculate enrichment scores for each gene set.
p-values and q-values are calculated by scrambling the phenotypes assigned to each sample or scrambling peptides’ fold changes, depending on the correlation metric used.
Parameters: - psms :
pandas.DataFrame
- gene_sets :
pandas.DataFrame
- pval : bool, optional
- recorrelate : bool, optional
- metric : str, optional
- phenotype :
pandas.Series
, optional - kwargs : dict, optional
See extra arguments passed to calculate_es_s and simulate_es_s_pi.
Returns: - df :
pandas.DataFrame
- psms :
-
pyproteome.pathways.enrichments.
estimate_pq
(vals)[source]¶ Estimate p- and q-values for an enrichment analysis using the ES(S, pi) values generated by simulate_es_s_pi.
Parameters: - vals :
pandas.DataFrame
- vals :
-
pyproteome.pathways.enrichments.
filter_gene_sets
(gene_sets, psms, min_hits=10)[source]¶ Filter gene sets to include only those with at least a given number of hits in a data set.
Parameters: - gene_sets :
pandas.DataFrame
- psms :
pandas.DataFrame
- min_hits : int, optional
Returns: - df :
pandas.DataFrame
- gene_sets :
-
pyproteome.pathways.enrichments.
filter_vals
(vals, min_hits=0, min_abs_score=0, max_pval=1, max_qval=1)[source]¶ Filter gene set enrichment scores using give ES(S) / p-value / q-value cutoffs.
Parameters: - vals :
pandas.DataFrame
- min_hits : int, optional
- min_abs_score : float, optional
- max_pval : float, optional
- max_qval : float, optional
Returns: - df :
pandas.DataFrame
- vals :
-
pyproteome.pathways.enrichments.
get_gene_changes
(psms)[source]¶ Extract the gene IDs and correlation values for each gene / phosphosite in a data set. Merge together duplicate IDs by calculating their mean correlation.
Parameters: - psms :
pandas.DataFrame
- psms :
-
pyproteome.pathways.enrichments.
simulate_es_s_pi
(vals, psms, gene_sets, phenotype=None, metric='spearman', p_iter=1000, n_cpus=None, **kwargs)[source]¶ Simulate ES(S, pi) by scrambling the phenotype / correlation values for a data set and recalculating gene set enrichment scores.
Parameters: - vals :
pandas.DataFrame
- psms :
pandas.DataFrame
- gene_sets :
pandas.DataFrame
- phenotype :
pandas.Series
, optional - metric : str, optional
- p_iter : int, optional
- n_cpus : int, optional
Returns: - df :
pandas.DataFrame
- vals :
pyproteome.pathways.go module¶
-
pyproteome.pathways.go.
get_go_ids
(go_ids, species='Homo sapiens', add_children=True)[source]¶ Fetch all gene symbols associated with a list of gene ontology term IDs.
Parameters: - go_ids : str or list of str
- species : str, optional
- add_children : bool, optional
Include all child terms of input GO IDs.
Returns: - list of str
pyproteome.pathways.gskb module¶
-
pyproteome.pathways.gskb.
get_gskb_pathways
(species)[source]¶ Download gene sets from GSKB.
Parameters: - species : str
Returns: - df :
pandas.DataFrame
, optional
pyproteome.pathways.msigdb module¶
-
pyproteome.pathways.msigdb.
get_msigdb_pathways
(species, remap=None)[source]¶ Download gene sets from MSigDB. Currently downloads v7.0 of the gene signature repositories.
Parameters: - species : str
Returns: - df :
pandas.DataFrame
, optional
pyproteome.pathways.pathwayscommon module¶
-
pyproteome.pathways.pathwayscommon.
get_pathway_common
(species)[source]¶ Download gene sets from Pathway Commons.
Parameters: - species : str
Returns: - df :
pandas.DataFrame
, optional
pyproteome.pathways.photon_ptm module¶
-
pyproteome.pathways.photon_ptm.
photon
(ds, folder_name=None, write_output=False, log2=True)[source]¶ Run PHOTON algorithm on a data set to find functional phosphorylation sites using protein-protein interaction networks.
Parameters: - ds :
pyproteome.data_sets.DataSet
- folder_name : str, optional
- log2 : bool, optional
Returns: - out_path : str
Path to results directory.
- ds :
pyproteome.pathways.plot module¶
-
pyproteome.pathways.plot.
plot_correlations
(gene_changes, ax=None)[source]¶ Plot the ranked list of correlations.
Parameters: - gene_changes :
pandas.DataFrame
Genes and their correlation values as calculated by get_gene_changes().
- figsize : tuple of (int, int), optional
Returns: - gene_changes :
-
pyproteome.pathways.plot.
plot_enrichment
(vals, cols=5)[source]¶ Plot enrichment score curves for each gene set.
Parameters: - vals :
pandas.DataFrame
The gene sets and scores calculated by enrichment_scores().
- cols : int, optional
- vals :
-
pyproteome.pathways.plot.
plot_gsea
(vals, gene_changes, min_hits=0, min_abs_score=0, max_pval=1, max_qval=1, name='', **kwargs)[source]¶ Run set enrichment analysis on a data set and generate all figures associated with that analysis.
Parameters: - vals :
pandas.DataFrame
- gene_changes :
pandas.DataFrame
Returns: - figs : list of
matplotlib.figure.Figure
- vals :
-
pyproteome.pathways.plot.
plot_nes
(vals, min_hits=0, min_abs_score=0, max_pval=0.05, max_qval=0.25, title=None, col=None, ax=None)[source]¶ Plot the ranked normalized enrichment score values.
Annotates significant gene sets with their name on the figure.
Parameters: - vals :
pandas.DataFrame
The gene sets and scores calculated by enrichment_scores().
- min_hits : int, optional
- min_abs_score : float, optional
- max_pval : float, optional
- max_qval : float, optional
- figsize : tuple of (int, int), optional
Returns: - vals :
pyproteome.pathways.plsr module¶
pyproteome.pathways.psp module¶
-
pyproteome.pathways.psp.
get_phosphomap_data
()[source]¶ Fetch mapping between phosphorylation sites of different species.
Returns: - df :
pandas.DataFrame
- df :
-
pyproteome.pathways.psp.
get_phosphoreg_data
()[source]¶ Fetch Phosphosite Plus regulation data.
Returns: - df :
pandas.DataFrame
- df :
-
pyproteome.pathways.psp.
get_phosphosite
(species, remap=False)[source]¶ Download phospho sets from PhophoSite Plus.
Parameters: - species : str
- remap : bool, optional
Returns: - df :
pandas.DataFrame
, optional
-
pyproteome.pathways.psp.
get_phosphosite_regulation
(species, remap=False)[source]¶ Download phospho sets from PhophoSite Plus.
Parameters: - species : str
- remap : bool, optional
Returns: - df :
pandas.DataFrame
, optional
pyproteome.pathways.ptmsigdb module¶
-
pyproteome.pathways.ptmsigdb.
get_ptmsigdb
(species)[source]¶ Download phospho sets for PTMSigDB.
Parameters: - species : str
Returns: - df :
pandas.DataFrame
pyproteome.pathways.wikipathways module¶
-
pyproteome.pathways.wikipathways.
get_wikipathways
(species)[source]¶ Download gene sets from WikiPathways.
Parameters: - species : str
Returns: - df :
pandas.DataFrame
, optional
-
pyproteome.pathways.wikipathways.
get_wikipathways_psites
(species)[source]¶ Download phospho sets from WikiPathways.
Parameters: - species : str
Returns: - df :
pandas.DataFrame
, optional