Metatranscriptomics bipype¶
Module functions¶
-
config_from_file
(_file)[source]¶ Reads parameters from configuration _file. Prepares target.txt and templates for SARTools.
Parameters: _file – configuration file for metatranscriptomic pipeline Returns: - ref_cond: reference condition defined by user
- all_conds: set of conditions (groups) from target.txt
- fastqs: list of fastq files on which analysis will be done
Return type: (ref_cond, all_conds, fastqs)
-
connect_db
(db)[source]¶ Connects database
Parameters: db – Path to SQL database Returns: Cursor object to database
-
dicto_reduce
(present, oversized)[source]¶ Removes all elements from dictionaries, which keys aren’t present in both.
Parameters: - present – dict
- oversized – dict
Returns: tuple of dicts
Return type: (oversized, present)
Warning
Order of parametres is opposite to results.
Example
>>> dict_1={'a':1,'c':3,'d':4} >>> dict_2={'a':3,'b':4,'c':4} >>> dicto_reduce(dict_1, dict_2) ({'a': 3, 'c': 4}, {'a': 1, 'c': 3})
-
fastq_to_fasta
(fastq)[source]¶ Runs fastq_to_fasta on fastq.
- GLOBAL:
- path to fastq_to_fasta program: PATH_FQ2FA
-
get_kegg_name
(ko)[source]¶ Returns name assigned to given KO identifier (from kegg.jp)
Parameters: ko – KO identifier (string) Returns: name assigned to ko (string)
-
get_ko_fc
(ko_dict, ref_cond, filepath, deseq=False)[source]¶ From given table file (SARTool), adds found fold changes to ko_dict.
Parameters: - ko_dict –
{KO_id:{cond1:value1, cond2:value2...}...}
dict - ref_cond – reference condition (string)
- filepath – filepath to output table file from edgeR or DESeq2
- deseq – True, if filepath points to DESeq2 table file False, if filepath points to edgeR table file
Returns: ko_dict with added fold changes from table file
- ko_dict –
-
get_kopathways
(database)[source]¶ Makes dictionaries from kopathways table from SQLite3 database.
Parameters: database – Cursor object to SQLite3 database. Returns: id -> pathways: {KO identifier: set[KEGG_Pathway_ids]}
For example:
{ 'K01194': set(['ko00500','ko00600',...]), 'K04501': set(['ko04390',...]) }
pathway -> ids mappings:
{KEGG_Pathway_id: set[KO identifiers]}
For example:
{ko12345: set([K12345, K12346,...]),...}
Return type: Two dictionaries
-
get_pathways
(database)[source]¶ Make dictionary from pathways table from SQLite3 database.
Parameters: database – Cursor object to SQLite3 database. Returns: dictionary in following format: {KEGG_Pathway_id:Name}
For example:
{ 'ko04060': 'Cytokine-cytokine receptor interaction', 'ko00910': 'Nitrogen metabolism' }
Return type: dict
-
get_tables
(database)[source]¶ Prints all tables included in SQLite3 database.
Parameters: database – Cursor object to SQLite3 database.
-
low_change
(ko_dict, all_conds)[source]¶ For every KO adds condition: 0, if condition is missing.
Parameters: - ko_dict –
{KO_id:{cond1:value1, cond2:value2...}...}
dict - all_conds – list of conditions (list of strings)
Returns: suplemented ko_dict
For example:
low_change( { 'K12345': {'pH5': 1.41, 'pH6': 1.73}, 'K23456': {'pH6': 2.0, 'pH8': 2.24} }, ['pH5', 'pH6', 'pH8'] )
gives:
{ 'K12345': {'pH5': 1.41, 'pH6': 1.73, 'pH8': 0.0}, 'K23456': {'pH5': 0.0, 'pH6': 2.0, 'pH8': 2.24} }
- ko_dict –
-
m8_to_ko
(file_, multi_id)[source]¶ Assigns and counts KEGG GENES identifiers from BLAST Tabular (flag: -m 8) output format file, for every KO from multi_id.
After mapping, writes data to output file.
Parameters: - file_ – Path to BLAST Tabular (flag: -m 8) format file
- multi_id – Dict
{KEGG GENES identifier : set[KO identifiers]}
Output file (outname) has following name:
outname = file_.replace('txt.m8', 'count')
and following format:
K00161 2 K00627 0 K00382 11
-
mapper
(ko_dict, ko_set)[source]¶ Assings every KO_id from ko_dict to KEGG_Pathway_id from ko_set
Parameters: - ko_dict –
{KO_id:{cond1:value1, cond2:value2...}...}
dict - ko_set –
{KEGG_Pathway_id:set[KO identifiers]}
dict
Returns: Dict with structure:
{KEGG_Pathway_id:{KO_id:{cond1:value1, cond2:value2...}...}...}
Return type: dict
- ko_dict –
-
mapper_write
(ko_path_dict, all_conds, out_dir)[source]¶ Writes file with KO and corresponding fold change, for every combination of condition & KEGG_Pathway_id.
Parameters: - ko_path_dict –
{KEGG_Pathway_id:{KO_id:{cond1:value1, cond2:value2...}...}...}
- all_conds – list of conditions (list of strings)
- out_dir – relative output directory path
Output file has following path:
out_dir/condX/ , following name: KEGG_Pathway_id.txt , following header: # KO KEGG_Pathway_id & following format: KO_id corresponding_fold_change
- ko_path_dict –
-
metatranscriptomics
(opts)[source]¶ Performs analyse of metagenomic data.
See also
- For more information please refer to:
-
out_content
(filelist, kopath_values, path_names, method='DESeq2')[source]¶ For every item in ‘kopath_values’ dictionary and for every file in ‘filelist’, writes to output file line with KOs, which are common for item.value and the set of KOs obtained from file.
Parameters: - filelist – List of paths to tab-delimited .txt files, where first column is a KO identifier.
- kopath_values –
{KEGG_Pathway_id:set[KO identifiers]}
dict.For example:
{ko12345:set([K12345, K12346,...]),...}
- path_names –
Dictionary in
{KEGG_Pathway_id:Name}
format.For example:
{ 'ko04060': 'Cytokine-cytokine receptor interaction', 'ko00910': 'Nitrogen metabolism' }
- method – Argument used only as a part of output file name
Output file has following name:
(method+'_'+filename.replace('txt', 'path_counts.csv')) where: filename = filepath.split('')[-1], if '' in filepath. filename = filepath.split('/')[-1], if '/' in filepath. filename = filepath, in other cases.
anf following headline:
ko_path_id;ko_path_name;percent common;common KOs
Writes only lines with non-zero common KOs.
-
pickle_or_db
(pickle, db)[source]¶ Reads pickle or SQL database, than makes a dict.
If appropriate pickle (a dict) is available, it is read. In the other case function reads ‘kogenes’ table from SQL database and makes missing pickle. Eventually returns dict.
Parameters: - pickle – Path to pickled dict in following format:
{KEGG GENES identifier : set[KO identifiers]}
- db – Cursor object to SQL database with ‘kogenes’ table
(KO identifier KEGG GENES identifier)
Returns: Dict in
{KEGG GENES identifier: set[KO identifiers]}
format.Some information for Bipype’s developers (delete this before final version): Code from this fuction was not a fuction in previous version and ‘args’ was hardcoded to: ‘kogenes.pckl’ & c (variable with db’s cursor)
- pickle – Path to pickled dict in following format:
-
progress
(what, estimated_percentage=None, done=True)[source]¶ Prints specially formatted information about progress.
Parameters: - what – a string with name of operation which was just performed, and should be reported to standard output as don or failed,
- estimated_percentage –
(int)
Percent should be calculated as part of whole execution; first and last 5 percent should be reserved for programs which runs ‘metatranscriptomics’, for pre- and postprocessing,
- done – informs whether the operation from ‘what’ argument failed or was successfully done.
-
rapsearch2
(input_file, threads)[source]¶ Runs
rapsearch2
for input_file in fasta format.Writes outputs in “m8/” directory.
- GLOBALS:
- path to RAPSearch2 program: PATH_RAPSEARCH
- path to similarity search database: PATH_REF_PROT_KO
-
run_SARTools
()[source]¶ Runs SARTools in R.
- HARDCODED:
- R templates:
- edger: template_script_DESeq2.r
- deseq: template_script_edgeR.r
-
run_fastq_to_fasta
(fastqs)[source]¶ Runs
fastq_to_fasta()
for every .fastq in fastqs.
-
run_ko_csv
(ko_dict_deseq, ko_dict_edger, all_conds, kopath_keys, path_names, ref_cond)[source]¶ For given ko_dicts writes CSV files with pathways and foldchanges
Parameters: - ko_dict –
{KO_id:{cond1:value1, cond2:value2...}...}
dict - all_conds – list of conditions (list of strings)
- kopath_keys –
{KO identifier:set[KEGG_Pathway_ids]}
dict - path_names –
{KEGG_Pathway_id:Name}
dict - filepath – output filepath
- Output files have following format (and header)::
- KO_id;Gene_name;paths ids;paths names;FC vs cond1;FC vs cond2;...;
- HARDCODED:
- Output files paths:
- deseq: ‘deseq.csv’
- edger: ‘edger.csv’
- ko_dict –
-
run_ko_map
()[source]¶ Runs
m8_to_ko()
for every .m8 file in cwd.- GLOBALS:
- path to KO database: PATH_KO_DB
- pickle to dict from KO GENES table from KO database: PATH_KO_PCKL
-
run_ko_remap
(deseq_files, edger_files, kopath_values, path_names)[source]¶ Runs
out_content(files, kopath_values, path_names (,'edgeR'))
for files from edger_paths and deseq_paths.Parameters: - deseq_diles – list of DESeq outputs paths
- edger_files – list of edgeR outputs paths
- kopath_values –
{KEGG_Pathway_id: set[KO identifiers]}
dict - path_names –
{KEGG_Pathway_id: Name}
dict
-
run_new_ko_remap
(deseq_files, edger_files, kopath_values, all_conds, ref_cond)[source]¶ Runs
get_ko_fc()
,low_change()
,mapper()
andmapper_write()
in appropriate way for files from deseq_files and edger_files.Parameters: - deseq_diles – list of DESeq outputs paths
- edger_files – list of edgeR outputs paths
- ref_cond – Reference condition (group) - string
- kopath_values –
{KEGG_Pathway_id:set[KO identifiers]}
dict - all_conds – list of conditions (list of strings)
Returns: {KO_id:{cond1:value1, cond2:value2...}...}
dict ko_dict_edger:{KO_id:{cond1:value1, cond2:value2...}...}
dictReturn type: ko_dict_deseq
- HARDCODED:
- Output directories paths:
- deseq: ‘new_ko_remap/deseq/’
- edger: ‘new_ko_remap/edger/’
-
run_pre_ko_remap
()[source]¶ Prepares args for
run_ko_remap()
orrun_new_ko_remap()
Returns: {KEGG_Pathway_id:Name}
dict kopath_keys:{KO identifier:set[KEGG_Pathway_ids]}
dict kopath_values:{KEGG_Pathway_id:set[KO identifiers]}
dict edger_files: list of edgeR outputs paths deseq_diles: list of DESeq outputs pathsReturn type: path_names - HARDCODED:
- Paths to files from SARTools:
- edger: ‘edger/*[pn].txt’
- deseq: ‘deseq/*[pn].txt’
- GLOBALS:
- path to KO database: PATH_KO_DB
-
run_rapsearch
(threads)[source]¶ Runs
rapsearch2()
for every .tmp.fasta in cwd.