pyproteome.motifs package¶
This module contains code for phosphorylation motif analysis.
It includes functions for discrete motif enrichment as well as generation of
motif logos. These logos can be generated locally (logo.make_logo()
) or
via automated hooks into online tools (plogo.make_logo()
,
weblogo.make_logo()
, icelogo.make_logo()
).
-
pyproteome.motifs.
generate_n_mers
(sequences, n=15, all_matches=True, fill_left='A', fill_right='A', mods=None, use_ptms=True, use_nterms=False, use_cterms=False)[source]¶ Generate n-mers around all sites of modification in sequences.
Parameters: - sequences : list of
pyproteome.data_sets.sequence.Sequence
- n : int, optional
- all_matches : bool, optional
Generate n-mers for all protein matches else just the first match.
- fill_left : str, optional
- fill_right : str, optional
- mods : list of tuple of str, str, optional
- use_ptms : bool, optional
- use_nterms : bool, optional
- use_cterms : bool, optional
Returns: - set of str
- sequences : list of
pyproteome.motifs.icelogo module¶
-
pyproteome.motifs.icelogo.
icelogo
(foreground, background, title='', width=800, height=600, pvalue=0.05, scoring='foldChange')[source]¶ Wraps calls to iceLogo [1], returning an image showing the enrichment of a sequence in a foreground set compared to a background set.
Parameters: - foreground : list of str
- background : list of str
- title : str, optional
- width : int, optional
- height : int, optional
- pval : float, optional
- scoring : string, optional
Returns: - fig :
IPython.display.Image
Notes
[1] Colaert, N., Helsens, K., Martens, L., Vandekerckhove, J., & Gevaert, K. (2009). Improved visualization of protein consensus sequences by iceLogo. Nature Methods, 6(11), 786–787. http://doi.org/10.1038/nmeth1109-786
pyproteome.motifs.logo module¶
-
pyproteome.motifs.logo.
logo
(fore, back, ax=None, title='', width=12, height=8, p=0.05, fade_power=1, low_res_cutoff=0, prob_fn=None, show_title=True, show_ylabel=True, show_n=True, minmaxy=None)[source]¶ Generate a sequence logo locally using pLogo’s enrichment score.
Parameters: - fore : list of str
- back : list of str
- title : str, optional
- p : float, optional
p-value to use for residue significance cutoff. This value is corrected for multiple-hypothesis testing before being used.
- fade_power : float, optional
Set transparency of residues with scores below p to: (score / p) ** fade_power.
- low_res_cutoff : float, optional
Hide residues with scores below p * low_res_cutoff.
- prob_fn : str, optional
Probability function to use for calculating enrichment. Either ‘hypergeom’ or ‘binom’. The default, hypergeom, is more accurate but more computationally expensive.
Returns: - fig :
matplotlib.figure.Figure
- axes :
matplotlib.axes.Axes
-
pyproteome.motifs.logo.
make_logo
(data, f, **kwargs)[source]¶ Create a logo from a pyproteome data set using a given filter to define the foreground set.
Parameters: - data :
pyproteome.data_sets.DataSet
- f : dict
Filter passed to
pyproteome.data_sets.DataSet.filter()
to define the foreground set.- kwargs
Arguments passed on to
logo()
Returns: - fig, axes
- data :
pyproteome.motifs.motif module¶
This module provides functionality for finding motifs in sequences.
Functionality includes n-mer generation.
-
class
pyproteome.motifs.motif.
Motif
(motif)[source]¶ Bases:
object
Contains a motif that may match to one or more protein sequences.
Matches include the regular single-letter amino acid names as well as phosphosites for serine, threonine, and tyrosine, non-polar amino acids, and positively and negatively charged amino acids.
Examples
>>> import pyproteome >>> motif = pyproteome.Motif('O..x.-+') >>> 'IEFyFER' in motif True >>> 'IEFyFED' in motif False >>> 'FFFFFFR' in motif False
Attributes: - motif : str
-
char_mapping
= {'+': 'RK', '-': 'DE', '.': 'ACDEFGHIKLMNPQRSTVWYystO-+x', 'O': 'MILV', 'x': 'st'}¶
-
motif
¶
-
pyproteome.motifs.motif.
generate_n_mers
(sequences, n=15, all_matches=True, fill_left='A', fill_right='A', mods=None, use_ptms=True, use_nterms=False, use_cterms=False)[source]¶ Generate n-mers around all sites of modification in sequences.
Parameters: - sequences : list of
pyproteome.data_sets.sequence.Sequence
- n : int, optional
- all_matches : bool, optional
Generate n-mers for all protein matches else just the first match.
- fill_left : str, optional
- fill_right : str, optional
- mods : list of tuple of str, str, optional
- use_ptms : bool, optional
- use_nterms : bool, optional
- use_cterms : bool, optional
Returns: - set of str
- sequences : list of
-
pyproteome.motifs.motif.
get_nmer_args
(kwargs)[source]¶ Extract all arguments from kwargs that are used by
generate_n_mers()
.Parameters: - kwargs : dict
Returns: - dict
-
pyproteome.motifs.motif.
motif_enrichment
(foreground, background, sig_cutoff=0.01, min_fore_hits=0, start_letters=None, pp_value=False, pp_iterations=100, cpu_count=None, force=False)[source]¶ Calculate motifs significantly enriched in a set of sequences. Uses a depth-first search algorithm to find discrete motifs that are enriched in a foreground set compared to a given background [1].
Parameters: - foreground : list of str
- background : list of str
- sig_cutoff : float, optional
- min_fore_hits : int, optional
- start_letters : list of str, optional
- pp_value : bool, optional
- pp_iterations : int, optional
- cpu_count : int, optional
Number of CPUs to use when calculating pp-values, does not apply to a single motif-enrichment process.
Returns: - df :
pandas.DataFrame
- p_dist : list of float
- pp_dist : list of float
Notes
[1] Joughin, Brian a et al. ‘An Integrated Comparative Phosphoproteomic and Bioinformatic Approach Reveals a Novel Class of MPM-2 Motifs Upregulated in EGFRvIII-Expressing Glioblastoma Cells.’ Molecular bioSystems 5.1 (2009): 59-67.
-
pyproteome.motifs.motif.
run_motif_enrichment
(data, f, **kwargs)[source]¶ Wraps
motif_enrichment()
, generating the list of foreground and background peptide sequences from a data set.Parameters: - data :
pyproteome.data_sets.data_set.DataSet
- f : dict or list of dict
Argument passed to
pyproteome.data_sets.data_set.DataSet.filter()
.- kwargs : dict
Arguments passed to
motif_enrichment()
.
Returns: - df :
pandas.DataFrame
- p_dist : list of float
- pp_dist : list of float
- data :
pyproteome.motifs.neighborhood module¶
-
pyproteome.motifs.neighborhood.
enriched_neighborhood
(data, f, residues, nmer_length=7, count_cutoff=2, mods=None)[source]¶ Calculates the hypergeometric enrichment value for the number of adjacent residues within a given window around all modification sites in a data set.
Parameters: - data :
pyproteome.data_sets.data_set.DataSet
- f : dict or list of dict
- residues : list of str
- nmer_length : int, optional
- count_cutoff : int, optional
- mods : str or list of str
Returns: - f :
matplotlib.figure.Figure
- ax :
matplotlib.axes.Axes
- pval : float
P-value, calculated with
scipy.stats.hypergeom
.- K : int
Number of sequences with # residues > count_cutoff in background list.
- N : int
Size of the background list of sequences.
- k : int
Number of sequences with # residues > count_cutoff in foreground list.
- n : int
Size of the foreground list of sequences.
- data :
pyproteome.motifs.phosphosite module¶
This file includes functions for downloading kinase-substrate associations from PhosphoSite Plus (https://www.phosphosite.org/).
-
pyproteome.motifs.phosphosite.
generate_logos
(species, kinases=None, min_foreground=10)[source]¶ Generate logos for all kinases documented on Phosphosite Plus.
Parameters: - species : str
Species name (i.e. ‘Human’ or ‘Homo sapiens’)
- kinases : list of str, optional
- min_foreground : int, optional
Minimum number of substrates needed for logo generation.
Returns: - list of
matplotlib.figure.Figure
-
pyproteome.motifs.phosphosite.
get_data
()[source]¶ Download the Kinase-Substrate Dataset from Phosphosite Plus.
Returns: - df :
pandas.DataFrame
- df :
pyproteome.motifs.plogo module¶
-
pyproteome.motifs.plogo.
format_title
(f=None, data=None)[source]¶ Generates a title automatically from a given data set and list of filters.
Parameters: - f : dict or list of dict
- data :
pyproteome.data_sets.data_set.DataSet
Returns: - str
-
pyproteome.motifs.plogo.
make_logo
(data, f, **kwargs)[source]¶ Wraps
plogo()
, generating the list of foreground and background peptide sequences from a data set.Parameters: - data :
pyproteome.data_sets.data_set.DataSet
- f : dict or list of dict
Argument passed to
pyproteome.data_sets.data_set.DataSet.filter()
.- kwargs : dict
Arguments passed to
plogo()
.
Returns: - str or
IPython.display.Image
- data :
-
pyproteome.motifs.plogo.
plogo
(foreground, background, fix_letter_pos=None, title='', width=800, height=600, ymax=None)[source]¶ Wraps calls to the pLogo web server [1], returning an image showing the enrichment of a sequence in a foreground set compared to a background set.
Parameters: - foreground : list of str
- background : list of str
- fix_letter_pos : list of tuple of (str, int), optional
- title : str, optional
- width : int, optional
- height : int, optional
- ymax : float, optional
Returns: - str or
IPython.display.Image
Notes
[1] O’Shea, Joseph P et al. “pLogo: A Probabilistic Approach to Visualizing Sequence Motifs.” Nature Methods 10.12 (2013): 1211–1212. Web.
pyproteome.motifs.weblogo module¶
-
pyproteome.motifs.weblogo.
make_logo
(data, **kwargs)[source]¶ Create a sequence logo figure.
Logos are created based on the frequencies of peptides in a data set.
Parameters: - data :
pyproteome.data_sets.DataSet
- data :