pyproteome.motifs package¶

This module contains code for phosphorylation motif analysis.

It includes functions for discrete motif enrichment as well as generation of motif logos. These logos can be generated locally (logo.make_logo()) or via automated hooks into online tools (plogo.make_logo(), weblogo.make_logo(), icelogo.make_logo()).

pyproteome.motifs.generate_n_mers(sequences, n=15, all_matches=True, fill_left='A', fill_right='A', mods=None, use_ptms=True, use_nterms=False, use_cterms=False)[source]¶

Generate n-mers around all sites of modification in sequences.

Parameters:	sequences : list of `pyproteome.data_sets.sequence.Sequence` n : int, optional all_matches : bool, optional Generate n-mers for all protein matches else just the first match. fill_left : str, optional fill_right : str, optional mods : list of tuple of str, str, optional use_ptms : bool, optional use_nterms : bool, optional use_cterms : bool, optional
Returns:	set of str

pyproteome.motifs.icelogo module¶

pyproteome.motifs.icelogo.icelogo(foreground, background, title='', width=800, height=600, pvalue=0.05, scoring='foldChange')[source]¶

Wraps calls to iceLogo [1], returning an image showing the enrichment of a sequence in a foreground set compared to a background set.

Parameters:	foreground : list of str background : list of str title : str, optional width : int, optional height : int, optional pval : float, optional scoring : string, optional
Returns:	fig : `IPython.display.Image`

Notes

[1]	Colaert, N., Helsens, K., Martens, L., Vandekerckhove, J., & Gevaert, K. (2009). Improved visualization of protein consensus sequences by iceLogo. Nature Methods, 6(11), 786–787. http://doi.org/10.1038/nmeth1109-786

pyproteome.motifs.icelogo.make_logo(data, f, m=None, letter_mod_types=None, **kwargs)[source]¶

pyproteome.motifs.logo module¶

pyproteome.motifs.logo.logo(fore, back, ax=None, title='', width=12, height=8, p=0.05, fade_power=1, low_res_cutoff=0, prob_fn=None, show_title=True, show_ylabel=True, show_n=True, minmaxy=None)[source]¶

Generate a sequence logo locally using pLogo’s enrichment score.

Parameters:

fore : list of str
back : list of str
title : str, optional
p : float, optional: p-value to use for residue significance cutoff. This value is corrected for multiple-hypothesis testing before being used.
fade_power : float, optional: Set transparency of residues with scores below p to: (score / p) ** fade_power.
low_res_cutoff : float, optional: Hide residues with scores below p * low_res_cutoff.
prob_fn : str, optional: Probability function to use for calculating enrichment. Either ‘hypergeom’ or ‘binom’. The default, hypergeom, is more accurate but more computationally expensive.

Returns:

fig : matplotlib.figure.Figure
axes : matplotlib.axes.Axes

pyproteome.motifs.logo.make_logo(data, f, **kwargs)[source]¶

Create a logo from a pyproteome data set using a given filter to define the foreground set.

Parameters:	data : `pyproteome.data_sets.DataSet` f : dict Filter passed to `pyproteome.data_sets.DataSet.filter()` to define the foreground set. kwargs Arguments passed on to `logo()`
Returns:	fig, axes

pyproteome.motifs.motif module¶

This module provides functionality for finding motifs in sequences.

Functionality includes n-mer generation.

class pyproteome.motifs.motif.Motif(motif)[source]¶

Bases: object

Contains a motif that may match to one or more protein sequences.

Matches include the regular single-letter amino acid names as well as phosphosites for serine, threonine, and tyrosine, non-polar amino acids, and positively and negatively charged amino acids.

Examples

>>> import pyproteome
>>> motif = pyproteome.Motif('O..x.-+')
>>> 'IEFyFER' in motif
True
>>> 'IEFyFED' in motif
False
>>> 'FFFFFFR' in motif
False

Attributes:	motif : str

char_mapping = {'+': 'RK', '-': 'DE', '.': 'ACDEFGHIKLMNPQRSTVWYystO-+x', 'O': 'MILV', 'x': 'st'}¶

children()[source]¶

match(other)[source]¶

Match a given sequence to this motif.

Parameters:	other : str
Returns:	bool

motif¶

pairwise_children(hit_list)[source]¶

pyproteome.motifs.motif.generate_n_mers(sequences, n=15, all_matches=True, fill_left='A', fill_right='A', mods=None, use_ptms=True, use_nterms=False, use_cterms=False)[source]¶

Generate n-mers around all sites of modification in sequences.

Parameters:	sequences : list of `pyproteome.data_sets.sequence.Sequence` n : int, optional all_matches : bool, optional Generate n-mers for all protein matches else just the first match. fill_left : str, optional fill_right : str, optional mods : list of tuple of str, str, optional use_ptms : bool, optional use_nterms : bool, optional use_cterms : bool, optional
Returns:	set of str

pyproteome.motifs.motif.get_nmer_args(kwargs)[source]¶

Extract all arguments from kwargs that are used by generate_n_mers().

Parameters:	kwargs : dict
Returns:	dict

pyproteome.motifs.motif.motif_enrichment(foreground, background, sig_cutoff=0.01, min_fore_hits=0, start_letters=None, pp_value=False, pp_iterations=100, cpu_count=None, force=False)[source]¶

Calculate motifs significantly enriched in a set of sequences. Uses a depth-first search algorithm to find discrete motifs that are enriched in a foreground set compared to a given background [1].

Parameters:	foreground : list of str background : list of str sig_cutoff : float, optional min_fore_hits : int, optional start_letters : list of str, optional pp_value : bool, optional pp_iterations : int, optional cpu_count : int, optional Number of CPUs to use when calculating pp-values, does not apply to a single motif-enrichment process.
Returns:	df : `pandas.DataFrame` p_dist : list of float pp_dist : list of float

Notes

[1]	Joughin, Brian a et al. ‘An Integrated Comparative Phosphoproteomic and Bioinformatic Approach Reveals a Novel Class of MPM-2 Motifs Upregulated in EGFRvIII-Expressing Glioblastoma Cells.’ Molecular bioSystems 5.1 (2009): 59-67.

pyproteome.motifs.motif.run_motif_enrichment(data, f, **kwargs)[source]¶

Wraps motif_enrichment(), generating the list of foreground and background peptide sequences from a data set.

Parameters:	data : `pyproteome.data_sets.data_set.DataSet` f : dict or list of dict Argument passed to `pyproteome.data_sets.data_set.DataSet.filter()`. kwargs : dict Arguments passed to `motif_enrichment()`.
Returns:	df : `pandas.DataFrame` p_dist : list of float pp_dist : list of float

pyproteome.motifs.neighborhood module¶

pyproteome.motifs.neighborhood.enriched_neighborhood(data, f, residues, nmer_length=7, count_cutoff=2, mods=None)[source]¶

Calculates the hypergeometric enrichment value for the number of adjacent residues within a given window around all modification sites in a data set.

Parameters:	data : `pyproteome.data_sets.data_set.DataSet` f : dict or list of dict residues : list of str nmer_length : int, optional count_cutoff : int, optional mods : str or list of str
Returns:	f : `matplotlib.figure.Figure` ax : `matplotlib.axes.Axes` pval : float P-value, calculated with `scipy.stats.hypergeom`. K : int Number of sequences with # residues > count_cutoff in background list. N : int Size of the background list of sequences. k : int Number of sequences with # residues > count_cutoff in foreground list. n : int Size of the foreground list of sequences.

pyproteome.motifs.phosphosite module¶

This file includes functions for downloading kinase-substrate associations from PhosphoSite Plus (https://www.phosphosite.org/).

pyproteome.motifs.phosphosite.generate_logos(species, kinases=None, min_foreground=10)[source]¶

Generate logos for all kinases documented on Phosphosite Plus.

Parameters:	species : str Species name (i.e. ‘Human’ or ‘Homo sapiens’) kinases : list of str, optional min_foreground : int, optional Minimum number of substrates needed for logo generation.
Returns:	list of `matplotlib.figure.Figure`

pyproteome.motifs.phosphosite.get_data()[source]¶

Download the Kinase-Substrate Dataset from Phosphosite Plus.

Returns:	df : `pandas.DataFrame`

pyproteome.motifs.plogo module¶

pyproteome.motifs.plogo.format_title(f=None, data=None)[source]¶

Generates a title automatically from a given data set and list of filters.

Parameters:	f : dict or list of dict data : `pyproteome.data_sets.data_set.DataSet`
Returns:	str

pyproteome.motifs.plogo.make_logo(data, f, **kwargs)[source]¶

Wraps plogo(), generating the list of foreground and background peptide sequences from a data set.

Parameters:	data : `pyproteome.data_sets.data_set.DataSet` f : dict or list of dict Argument passed to `pyproteome.data_sets.data_set.DataSet.filter()`. kwargs : dict Arguments passed to `plogo()`.
Returns:	str or `IPython.display.Image`

pyproteome.motifs.plogo.plogo(foreground, background, fix_letter_pos=None, title='', width=800, height=600, ymax=None)[source]¶

Wraps calls to the pLogo web server [1], returning an image showing the enrichment of a sequence in a foreground set compared to a background set.

Parameters:	foreground : list of str background : list of str fix_letter_pos : list of tuple of (str, int), optional title : str, optional width : int, optional height : int, optional ymax : float, optional
Returns:	str or `IPython.display.Image`

Notes

[1]	O’Shea, Joseph P et al. “pLogo: A Probabilistic Approach to Visualizing Sequence Motifs.” Nature Methods 10.12 (2013): 1211–1212. Web.

pyproteome.motifs.weblogo module¶

pyproteome.motifs.weblogo.make_logo(data, **kwargs)[source]¶

Create a sequence logo figure.

Logos are created based on the frequencies of peptides in a data set.

Parameters:	data : `pyproteome.data_sets.DataSet`

pyproteome.motifs package¶

pyproteome.motifs.icelogo module¶

pyproteome.motifs.logo module¶

pyproteome.motifs.motif module¶

pyproteome.motifs.neighborhood module¶

pyproteome.motifs.phosphosite module¶

pyproteome.motifs.plogo module¶

pyproteome.motifs.weblogo module¶

Table of Contents

Previous topic

Next topic

This Page