pyproteome package¶
Module contents¶
-
pyproteome.
import_all
(line=None)[source]¶ Inialize and import many packages using IPython Notebooks magic.
Imports numpy pandas, seaborn sklearn, and pyproteome packages. Sets visual display options for matplotlib and adds a logging handlers. Also applies auto-reload to pyproteome for developers.
Examples
>>> from pyproteome import * >>> %import_all
Subpackages¶
- pyproteome.analysis package
- pyproteome.camv package
- pyproteome.cluster package
- pyproteome.data_sets package
- pyproteome.discoverer package
- pyproteome.motifs package
- pyproteome.pathways package
- pyproteome.pathways.enrichments module
- pyproteome.pathways.go module
- pyproteome.pathways.gskb module
- pyproteome.pathways.msigdb module
- pyproteome.pathways.pathwayscommon module
- pyproteome.pathways.photon_ptm module
- pyproteome.pathways.plot module
- pyproteome.pathways.plsr module
- pyproteome.pathways.psp module
- pyproteome.pathways.ptmsigdb module
- pyproteome.pathways.wikipathways module
- pyproteome.pride package
- pyproteome.pypuniprot package
Submodules¶
pyproteome.levels module¶
This module provides functionality for normalizing protein data.
Levels can be extracted from supernatant or phosphotyrosine runs using median or mean peptide levels across multiple channels.
-
pyproteome.levels.
get_channel_levels
(data, norm_channels=None, method='median', cols=2)[source]¶ Calculate channel normalization levels. This value is calculated by selecting the peak of Gaussian KDE distribution fitted to channel ratio values.
Parameters: - data :
pyproteome.data_sets.DataSet
- norm_channels : list of str, optional
Sample names of channels to use for normalization.
- method : str, optional
Normalize to the ‘mean’ or ‘median’ of each row.
- cols : int, optional
Number of columns used when displaying KDE distributions.
Returns: - fig :
matplotlib.figure.Figure
- channel_levels : dict of str, float
- data :
pyproteome.loading module¶
This module provides functionality for loading data sets.
Functionality includes loading CAMV and Proteome Discoverer data sets.
-
pyproteome.loading.
load_psms
(basename, pick_best_psm=True)[source]¶ Load a list of peptide-spectrum matches (PSMs) from a .msf file produced by Proteome Discoverer.
Parameters: - basename : str
Base name of the data set (i.e. ‘CK-H1-pY’ for ‘CK-H1-pY.msf’).
- pick_best_psm : bool, optional
Select the best scoring PSM for a given scan, otherwise load all PSMs.
Returns: - psms :
pandas.DataFrame
pyproteome.paths module¶
This module tracks the path to user data files. Developers can override paths here when using a custom data hierarchy.
-
pyproteome.paths.
BASE_DIR
= '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs'¶ Location of the base directory containing proteomics data. By default this is set to the current or parent directory, whichever contains any folders matching the expected directory structure.
-
pyproteome.paths.
CAMV_NAME
= 'CAMV Output'¶ Name of the directory containing validated CAMV data.
-
pyproteome.paths.
CAMV_OUT_DIR
= '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/CAMV Output'¶ Location of the directory containing validated CAMV data. By default it is set to
FIGURES_NAME
in the current or parent directory.
-
pyproteome.paths.
FIGURES_DIR
= '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/Figures'¶ Location of the directory for saving output figures. By default it is set to
FIGURES_NAME
in the current or parent directory.
-
pyproteome.paths.
FIGURES_NAME
= 'Figures'¶ Name of the directory for saving output figures.
-
pyproteome.paths.
MS_RAW_DIR
= '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/MS RAW'¶ Location of the directory containing raw mass spectrometry files. By default it is set to
FIGURES_NAME
in the current or parent directory.
-
pyproteome.paths.
MS_RAW_NAME
= 'MS RAW'¶ Name of the directory containing raw mass spectrometry files.
-
pyproteome.paths.
MS_SEARCHED_DIR
= '/home/docs/checkouts/readthedocs.org/user_builds/pyproteome/checkouts/latest/docs/Searched'¶ Location of the directory containing Proteome Discoverer .msf search files. By default it is set to
FIGURES_NAME
in the current or parent directory.
-
pyproteome.paths.
MS_SEARCHED_NAME
= 'Searched'¶ Name of the directory containing Proteome Discoverer .msf search files.
pyproteome.species module¶
This module includes functions for mapping spcies names.
-
pyproteome.species.
INV_ORGANISM_MAPPING
= {'cow': 'Bos taurus', 'dog': 'Canis familiaris', 'ferret': 'Mustela putorius', 'fruit fly': 'Drosophila melanogaster', 'horse': 'Equus caballus', 'human': 'Homo sapiens', 'mouse': 'Mus musculus', 'rat': 'Rattus norvegicus'}¶ Mapping between species’ colloquial name and its specific name.
-
pyproteome.species.
ORGANISM_MAPPING
= {'Bos taurus': 'cow', 'Canis familiaris': 'dog', 'Drosophila melanogaster': 'fruit fly', 'Equus caballus': 'horse', 'Homo sapiens': 'human', 'Mus musculus': 'mouse', 'Mustela putorius': 'ferret', 'Rattus norvegicus': 'rat'}¶ Mapping between species’ specific name and its colloquial name. (i.e. ‘Homo sapiens’ > ‘human’)
pyproteome.utils module¶
Utility functions used in other modules.
-
pyproteome.utils.
DEFAULT_DPI
= 300¶ The DPI to use when generating all image figures.
-
class
pyproteome.utils.
DefaultOrderedDict
(default_factory=None, *a, **kw)[source]¶ Bases:
collections.OrderedDict
-
pyproteome.utils.
PICKLE_DIR
= '.pyproteome'¶ Default directory to use for saving / loading pickle files.
-
pyproteome.utils.
adjust_text
(*args, **kwargs)[source]¶ Wraps importing and calling
adjustText.adjust_text()
.
-
pyproteome.utils.
flatten_list
(lst)[source]¶ Flattens an Iterable with arbitrary nesting into a single list.
Parameters: - lst : Iterable
Returns: - flattened : list
Examples
>>> utils.flatten_list([0, [1, 2], [[3]], 'string']) [0, 1, 2, 3, 'string']
-
pyproteome.utils.
flatten_set
(lst)[source]¶ Flattens an Iterable with arbitrary nesting into a single set.
Parameters: - lst : Iterable
Returns: - flattened : set
Examples
>>> utils.flatten_set([0, [1, 2], [[3]], 'string']) set([0, 1, 2, 3, 'string'])
-
pyproteome.utils.
fuzzy_find
(needle, haystack)[source]¶ Find the longest matching subsequence of needle within haystack.
Returns the corresponding index from the beginning of needle.
Parameters: - needle : str
- haystack : str
Returns: - index : int
-
pyproteome.utils.
get_name
(proteins)[source]¶ Generates a shortened version of a protein name. For peptides that map to multiple proteins, this function finds the longest common prefix (excluding digits) that matches all proteins.
Parameters: - proteins :
data_sets.protein.Proteins
Returns: - str
Examples
>>> pyp.utils.get_name( ... protein.Proteins([ ... protein.Protein(gene='Dpysl2'), ... protein.Protein(gene='Dpysl3'), ... ]) ... ) 'Dpysl2/3' >>> pyp.utils.get_name( ... protein.Proteins([ ... protein.Protein(gene='Src'), ... protein.Protein(gene='Fgr'), ... protein.Protein(gene='Fyn'), ... ]) ... ) 'Src / Fgr / Fyn' >>> pyp.utils.get_name( ... protein.Proteins([ ... protein.Protein(gene='Tuba1a'), ... protein.Protein(gene='Tuba1b'), ... protein.Protein(gene='Tuba1c'), ... protein.Protein(gene='Tuba4a'), ... protein.Protein(gene='Tuba8'), ... ]) ... ) 'Tuba1a/1b/1c/3a/4a/8'
- proteins :
-
pyproteome.utils.
load
(name, default=None)[source]¶ Load a variable using the pickle module.
Parameters: - name : str
The name to use for data storage.
- default : object, optional
Returns: - val : object
-
pyproteome.utils.
makedirs
(folder_name=None)[source]¶ Creates a folder if it does not exist.
Parameters: - folder_name : str, optional
Returns: - folder_name : str
-
pyproteome.utils.
memoize
(func)[source]¶ Memoize a function, saving its returned value for a given set of parameters in an in-memory cache.
Parameters: - func : func
Returns: - memorized : func
Examples
>>> from pyproteome import utils >>> @utils.memoize ... def download_data(species): ... ... # Fetch / calculate the return value once
-
pyproteome.utils.
norm
(channels)[source]¶ Converts a list of channels to their normalized names.
Parameters: - channels : list of str or dict of (str, str) or None
Returns: - new_channels : list of str or dict of str, str
-
pyproteome.utils.
save
(name, val=None)[source]¶ Save a variable using the pickle module.
Parameters: - name : str
The name to use for data storage.
- val : object, optional
Returns: - val : object