pyproteome.motifs package

This module contains code for phosphorylation motif analysis.

It includes functions for discrete motif enrichment as well as generation of motif logos. These logos can be generated locally (logo.make_logo()) or via automated hooks into online tools (plogo.make_logo(), weblogo.make_logo(), icelogo.make_logo()).

pyproteome.motifs.generate_n_mers(sequences, n=15, all_matches=True, fill_left='A', fill_right='A', mods=None, use_ptms=True, use_nterms=False, use_cterms=False)[source]

Generate n-mers around all sites of modification in sequences.

Parameters:
sequences : list of pyproteome.data_sets.sequence.Sequence
n : int, optional
all_matches : bool, optional

Generate n-mers for all protein matches else just the first match.

fill_left : str, optional
fill_right : str, optional
mods : list of tuple of str, str, optional
use_ptms : bool, optional
use_nterms : bool, optional
use_cterms : bool, optional
Returns:
set of str

pyproteome.motifs.motif module

This module provides functionality for finding motifs in sequences.

Functionality includes n-mer generation.

class pyproteome.motifs.motif.Motif(motif)[source]

Bases: object

Contains a motif that may match to one or more protein sequences.

Matches include the regular single-letter amino acid names as well as phosphosites for serine, threonine, and tyrosine, non-polar amino acids, and positively and negatively charged amino acids.

Examples

>>> import pyproteome
>>> motif = pyproteome.Motif('O..x.-+')
>>> 'IEFyFER' in motif
True
>>> 'IEFyFED' in motif
False
>>> 'FFFFFFR' in motif
False
Attributes:
motif : str
char_mapping = {'+': 'RK', '-': 'DE', '.': 'ACDEFGHIKLMNPQRSTVWYystO-+x', 'O': 'MILV', 'x': 'st'}
children()[source]
match(other)[source]

Match a given sequence to this motif.

Parameters:
other : str
Returns:
bool
motif
pairwise_children(hit_list)[source]
pyproteome.motifs.motif.generate_n_mers(sequences, n=15, all_matches=True, fill_left='A', fill_right='A', mods=None, use_ptms=True, use_nterms=False, use_cterms=False)[source]

Generate n-mers around all sites of modification in sequences.

Parameters:
sequences : list of pyproteome.data_sets.sequence.Sequence
n : int, optional
all_matches : bool, optional

Generate n-mers for all protein matches else just the first match.

fill_left : str, optional
fill_right : str, optional
mods : list of tuple of str, str, optional
use_ptms : bool, optional
use_nterms : bool, optional
use_cterms : bool, optional
Returns:
set of str
pyproteome.motifs.motif.get_nmer_args(kwargs)[source]

Extract all arguments from kwargs that are used by generate_n_mers().

Parameters:
kwargs : dict
Returns:
dict
pyproteome.motifs.motif.motif_enrichment(foreground, background, sig_cutoff=0.01, min_fore_hits=0, start_letters=None, pp_value=False, pp_iterations=100, cpu_count=None, force=False)[source]

Calculate motifs significantly enriched in a set of sequences. Uses a depth-first search algorithm to find discrete motifs that are enriched in a foreground set compared to a given background [1].

Parameters:
foreground : list of str
background : list of str
sig_cutoff : float, optional
min_fore_hits : int, optional
start_letters : list of str, optional
pp_value : bool, optional
pp_iterations : int, optional
cpu_count : int, optional

Number of CPUs to use when calculating pp-values, does not apply to a single motif-enrichment process.

Returns:
df : pandas.DataFrame
p_dist : list of float
pp_dist : list of float

Notes

[1]Joughin, Brian a et al. ‘An Integrated Comparative Phosphoproteomic and Bioinformatic Approach Reveals a Novel Class of MPM-2 Motifs Upregulated in EGFRvIII-Expressing Glioblastoma Cells.’ Molecular bioSystems 5.1 (2009): 59-67.
pyproteome.motifs.motif.run_motif_enrichment(data, f, **kwargs)[source]

Wraps motif_enrichment(), generating the list of foreground and background peptide sequences from a data set.

Parameters:
data : pyproteome.data_sets.data_set.DataSet
f : dict or list of dict

Argument passed to pyproteome.data_sets.data_set.DataSet.filter().

kwargs : dict

Arguments passed to motif_enrichment().

Returns:
df : pandas.DataFrame
p_dist : list of float
pp_dist : list of float

pyproteome.motifs.neighborhood module

pyproteome.motifs.neighborhood.enriched_neighborhood(data, f, residues, nmer_length=7, count_cutoff=2, mods=None)[source]

Calculates the hypergeometric enrichment value for the number of adjacent residues within a given window around all modification sites in a data set.

Parameters:
data : pyproteome.data_sets.data_set.DataSet
f : dict or list of dict
residues : list of str
nmer_length : int, optional
count_cutoff : int, optional
mods : str or list of str
Returns:
f : matplotlib.figure.Figure
ax : matplotlib.axes.Axes
pval : float

P-value, calculated with scipy.stats.hypergeom.

K : int

Number of sequences with # residues > count_cutoff in background list.

N : int

Size of the background list of sequences.

k : int

Number of sequences with # residues > count_cutoff in foreground list.

n : int

Size of the foreground list of sequences.

pyproteome.motifs.phosphosite module

This file includes functions for downloading kinase-substrate associations from PhosphoSite Plus (https://www.phosphosite.org/).

pyproteome.motifs.phosphosite.generate_logos(species, kinases=None, min_foreground=10)[source]

Generate logos for all kinases documented on Phosphosite Plus.

Parameters:
species : str

Species name (i.e. ‘Human’ or ‘Homo sapiens’)

kinases : list of str, optional
min_foreground : int, optional

Minimum number of substrates needed for logo generation.

Returns:
list of matplotlib.figure.Figure
pyproteome.motifs.phosphosite.get_data()[source]

Download the Kinase-Substrate Dataset from Phosphosite Plus.

Returns:
df : pandas.DataFrame