pyproteome package

Module contents

pyproteome.import_all(line=None)[source]

Inialize and import many packages using IPython Notebooks magic.

Imports numpy pandas, seaborn sklearn, and pyproteome packages. Sets visual display options for matplotlib and adds a logging handlers. Also applies auto-reload to pyproteome for developers.

Examples

>>> from pyproteome import *
>>> %import_all

Submodules

pyproteome.levels module

This module provides functionality for normalizing protein data.

Levels can be extracted from supernatant or phosphotyrosine runs using median or mean peptide levels across multiple channels.

pyproteome.levels.get_channel_levels(data, norm_channels=None, method='median', cols=2)[source]

Calculate channel normalization levels. This value is calculated by selecting the peak of Gaussian KDE distribution fitted to channel ratio values.

Parameters:
data : pyproteome.data_sets.DataSet
norm_channels : list of str, optional

Sample names of channels to use for normalization.

method : str, optional

Normalize to the ‘mean’ or ‘median’ of each row.

cols : int, optional

Number of columns used when displaying KDE distributions.

Returns:
fig : matplotlib.figure.Figure
channel_levels : dict of str, float
pyproteome.levels.kde_max(points)[source]

Estimate the center of a quantification channel by fitting a gaussian KDE function and finding its maximum.

Parameters:
points : list of float
Returns:
float

pyproteome.loading module

This module provides functionality for loading data sets.

Functionality includes loading CAMV and Proteome Discoverer data sets.

pyproteome.loading.load_psms(basename, pick_best_psm=True)[source]

Load a list of peptide-spectrum matches (PSMs) from a .msf file produced by Proteome Discoverer.

Parameters:
basename : str

Base name of the data set (i.e. ‘CK-H1-pY’ for ‘CK-H1-pY.msf’).

pick_best_psm : bool, optional

Select the best scoring PSM for a given scan, otherwise load all PSMs.

Returns:
psms : pandas.DataFrame

pyproteome.paths module

This module tracks the path to user data files. Developers can override paths here when using a custom data hierarchy.

pyproteome.paths.BASE_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs'

Location of the base directory containing proteomics data. By default this is set to the current or parent directory, whichever contains any folders matching the expected directory structure.

pyproteome.paths.CAMV_NAME = 'CAMV Output'

Name of the directory containing validated CAMV data.

pyproteome.paths.CAMV_OUT_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/CAMV Output'

Location of the directory containing validated CAMV data. By default it is set to FIGURES_NAME in the current or parent directory.

pyproteome.paths.FIGURES_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/Figures'

Location of the directory for saving output figures. By default it is set to FIGURES_NAME in the current or parent directory.

pyproteome.paths.FIGURES_NAME = 'Figures'

Name of the directory for saving output figures.

pyproteome.paths.MS_RAW_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/MS RAW'

Location of the directory containing raw mass spectrometry files. By default it is set to FIGURES_NAME in the current or parent directory.

pyproteome.paths.MS_RAW_NAME = 'MS RAW'

Name of the directory containing raw mass spectrometry files.

pyproteome.paths.MS_SEARCHED_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/Searched'

Location of the directory containing Proteome Discoverer .msf search files. By default it is set to FIGURES_NAME in the current or parent directory.

pyproteome.paths.MS_SEARCHED_NAME = 'Searched'

Name of the directory containing Proteome Discoverer .msf search files.

pyproteome.paths.find_base_dir()[source]

Finds the base directory containing the search / raw / scripts / figures folders. May be the current working directory or a parent of it.

Returns:
path : str
pyproteome.paths.set_base_dir(path)[source]

Set the base directory containing the search / raw / figures folders.

Parameters:
path : str

pyproteome.species module

This module includes functions for mapping spcies names.

pyproteome.species.INV_ORGANISM_MAPPING = {'cow': 'Bos taurus', 'dog': 'Canis familiaris', 'ferret': 'Mustela putorius', 'fruit fly': 'Drosophila melanogaster', 'horse': 'Equus caballus', 'human': 'Homo sapiens', 'mouse': 'Mus musculus', 'rat': 'Rattus norvegicus'}

Mapping between species’ colloquial name and its specific name.

pyproteome.species.ORGANISM_MAPPING = {'Bos taurus': 'cow', 'Canis familiaris': 'dog', 'Drosophila melanogaster': 'fruit fly', 'Equus caballus': 'horse', 'Homo sapiens': 'human', 'Mus musculus': 'mouse', 'Mustela putorius': 'ferret', 'Rattus norvegicus': 'rat'}

Mapping between species’ specific name and its colloquial name. (i.e. ‘Homo sapiens’ > ‘human’)

pyproteome.utils module

Utility functions used in other modules.

pyproteome.utils.DEFAULT_DPI = 300

The DPI to use when generating all image figures.

class pyproteome.utils.DefaultOrderedDict(default_factory=None, *a, **kw)[source]

Bases: collections.OrderedDict

copy() → a shallow copy of od[source]
pyproteome.utils.PICKLE_DIR = '.pyproteome'

Default directory to use for saving / loading pickle files.

pyproteome.utils.adjust_text(*args, **kwargs)[source]

Wraps importing and calling adjustText.adjust_text().

pyproteome.utils.flatten_list(lst)[source]

Flattens an Iterable with arbitrary nesting into a single list.

Parameters:
lst : Iterable
Returns:
flattened : list

Examples

>>> utils.flatten_list([0, [1, 2], [[3]], 'string'])
[0, 1, 2, 3, 'string']
pyproteome.utils.flatten_set(lst)[source]

Flattens an Iterable with arbitrary nesting into a single set.

Parameters:
lst : Iterable
Returns:
flattened : set

Examples

>>> utils.flatten_set([0, [1, 2], [[3]], 'string'])
set([0, 1, 2, 3, 'string'])
pyproteome.utils.fuzzy_find(needle, haystack)[source]

Find the longest matching subsequence of needle within haystack.

Returns the corresponding index from the beginning of needle.

Parameters:
needle : str
haystack : str
Returns:
index : int
pyproteome.utils.get_name(proteins)[source]

Generates a shortened version of a protein name. For peptides that map to multiple proteins, this function finds the longest common prefix (excluding digits) that matches all proteins.

Parameters:
proteins : data_sets.protein.Proteins
Returns:
str

Examples

>>> pyp.utils.get_name(
...     protein.Proteins([
...         protein.Protein(gene='Dpysl2'),
...         protein.Protein(gene='Dpysl3'),
...     ])
... )
'Dpysl2/3'
>>> pyp.utils.get_name(
...     protein.Proteins([
...         protein.Protein(gene='Src'),
...         protein.Protein(gene='Fgr'),
...         protein.Protein(gene='Fyn'),
...     ])
... )
'Src / Fgr / Fyn'
>>> pyp.utils.get_name(
...     protein.Proteins([
...         protein.Protein(gene='Tuba1a'),
...         protein.Protein(gene='Tuba1b'),
...         protein.Protein(gene='Tuba1c'),
...         protein.Protein(gene='Tuba4a'),
...         protein.Protein(gene='Tuba8'),
...     ])
... )
'Tuba1a/1b/1c/3a/4a/8'
pyproteome.utils.load(name, default=None)[source]

Load a variable using the pickle module.

Parameters:
name : str

The name to use for data storage.

default : object, optional
Returns:
val : object
pyproteome.utils.make_folder(data=None, folder_name=None, sub='Output')[source]
pyproteome.utils.makedirs(folder_name=None)[source]

Creates a folder if it does not exist.

Parameters:
folder_name : str, optional
Returns:
folder_name : str
pyproteome.utils.memoize(func)[source]

Memoize a function, saving its returned value for a given set of parameters in an in-memory cache.

Parameters:
func : func
Returns:
memorized : func

Examples

>>> from pyproteome import utils
>>> @utils.memoize
... def download_data(species):
...    ...  # Fetch / calculate the return value once
pyproteome.utils.norm(channels)[source]

Converts a list of channels to their normalized names.

Parameters:
channels : list of str or dict of (str, str) or None
Returns:
new_channels : list of str or dict of str, str
pyproteome.utils.save(name, val=None)[source]

Save a variable using the pickle module.

Parameters:
name : str

The name to use for data storage.

val : object, optional
Returns:
val : object
pyproteome.utils.stars(p, ns='ns')[source]

Calculate the stars to indicate significant changes.

**** : p < 1e-4

*** : p < 1e-3

** : p < 1e-2

* : p < 5e-2

ns : not significant

Parameters:
p : float
ns : str, optional
Returns:
str
pyproteome.utils.which(program)[source]

Checks if a program exists in PATH’s list of directories.

Parameters:
program : str
Returns:
path : str or None

pyproteome.version module

pyproteome.version.version = '0.12.0'

The version of pyproteome that is installed.