Metatranscriptomics bipype¶

Module functions¶

auto_tax_read(db_loc)[source]¶: Reads pickled {KEGG GENES number: set[KO identifiers]} dict.

config_from_file(_file)[source]¶

Reads parameters from configuration _file. Prepares target.txt and templates for SARTools.

Parameters:	_file – configuration file for metatranscriptomic pipeline
Returns:	ref_cond: reference condition defined by user all_conds: set of conditions (groups) from target.txt fastqs: list of fastq files on which analysis will be done
Return type:	(ref_cond, all_conds, fastqs)

connect_db(db)[source]¶

Connects database

Parameters:	db – Path to SQL database
Returns:	Cursor object to database

dicto_reduce(present, oversized)[source]¶

Removes all elements from dictionaries, which keys aren’t present in both.

Parameters:	present – dict oversized – dict
Returns:	tuple of dicts
Return type:	(oversized, present)

Warning

Order of parametres is opposite to results.

Example

>>> dict_1={'a':1,'c':3,'d':4}
>>> dict_2={'a':3,'b':4,'c':4}
>>> dicto_reduce(dict_1, dict_2)
({'a': 3, 'c': 4}, {'a': 1, 'c': 3})

fastq_to_fasta(fastq)[source]¶

Runs fastq_to_fasta on fastq.

GLOBAL:

path to fastq_to_fasta program: PATH_FQ2FA

get_kegg_name(ko)[source]¶

Returns name assigned to given KO identifier (from kegg.jp)

Parameters:	ko – KO identifier (string)
Returns:	name assigned to ko (string)

get_ko_fc(ko_dict, ref_cond, filepath, deseq=False)[source]¶

From given table file (SARTool), adds found fold changes to ko_dict.

Parameters:	ko_dict – `{KO_id:{cond1:value1, cond2:value2...}...}` dict ref_cond – reference condition (string) filepath – filepath to output table file from edgeR or DESeq2 deseq – True, if filepath points to DESeq2 table file False, if filepath points to edgeR table file
Returns:	ko_dict with added fold changes from table file

get_kopathways(database)[source]¶

Makes dictionaries from kopathways table from SQLite3 database.

Parameters: database – Cursor object to SQLite3 database.

Returns:

id -> pathways:

{KO identifier: set[KEGG_Pathway_ids]}

For example:

{
    'K01194': set(['ko00500','ko00600',...]),
    'K04501': set(['ko04390',...])
}

pathway -> ids mappings:

{KEGG_Pathway_id: set[KO identifiers]}

For example:

{ko12345: set([K12345, K12346,...]),...}

Return type: Two dictionaries

get_pathways(database)[source]¶

Make dictionary from pathways table from SQLite3 database.

Parameters:	database – Cursor object to SQLite3 database.
Returns:	dictionary in following format: {KEGG_Pathway_id:Name} For example: { 'ko04060': 'Cytokine-cytokine receptor interaction', 'ko00910': 'Nitrogen metabolism' }
Return type:	dict

get_tables(database)[source]¶

Prints all tables included in SQLite3 database.

Parameters:	database – Cursor object to SQLite3 database.

low_change(ko_dict, all_conds)[source]¶

For every KO adds condition: 0, if condition is missing.

Parameters:

ko_dict – {KO_id:{cond1:value1, cond2:value2...}...} dict
all_conds – list of conditions (list of strings)

Returns:

suplemented ko_dict

For example:

low_change(
    {
        'K12345': {'pH5': 1.41, 'pH6': 1.73},
        'K23456': {'pH6': 2.0, 'pH8': 2.24}
    },
    ['pH5', 'pH6', 'pH8']
)

gives:

{
    'K12345': {'pH5': 1.41, 'pH6': 1.73, 'pH8': 0.0},
    'K23456': {'pH5': 0.0, 'pH6': 2.0, 'pH8': 2.24}
}

m8_to_ko(file_, multi_id)[source]¶

Assigns and counts KEGG GENES identifiers from BLAST Tabular (flag: -m 8) output format file, for every KO from multi_id.

After mapping, writes data to output file.

Parameters:	file_ – Path to BLAST Tabular (flag: -m 8) format file multi_id – Dict `{KEGG GENES identifier : set[KO identifiers]}`

Output file (outname) has following name:

outname = file_.replace('txt.m8', 'count')

and following format:

K00161  2
K00627  0
K00382  11

mapper(ko_dict, ko_set)[source]¶

Assings every KO_id from ko_dict to KEGG_Pathway_id from ko_set

Parameters:

ko_dict – {KO_id:{cond1:value1, cond2:value2...}...} dict
ko_set – {KEGG_Pathway_id:set[KO identifiers]} dict

Returns:

Dict with structure:

{KEGG_Pathway_id:{KO_id:{cond1:value1, cond2:value2...}...}...}

Return type:

dict

mapper_write(ko_path_dict, all_conds, out_dir)[source]¶

Writes file with KO and corresponding fold change, for every combination of condition & KEGG_Pathway_id.

Parameters:	ko_path_dict – `{KEGG_Pathway_id:{KO_id:{cond1:value1, cond2:value2...}...}...}` all_conds – list of conditions (list of strings) out_dir – relative output directory path

Output file has following path:

out_dir/condX/
        , following name:
KEGG_Pathway_id.txt
        , following header:
# KO KEGG_Pathway_id
        & following format:
KO_id corresponding_fold_change

metatranscriptomics(opts)[source]¶

Performs analyse of metagenomic data.

See also

For more information please refer to:

run_fastq_to_fasta()
run_rapsearch()
run_ko_map()
run_SARTools()
run_pre_ko_remap()
run_ko_remap()
run_new_ko_remap()
run_ko_csv()

out_content(filelist, kopath_values, path_names, method='DESeq2')[source]¶

For every item in ‘kopath_values’ dictionary and for every file in ‘filelist’, writes to output file line with KOs, which are common for item.value and the set of KOs obtained from file.

Parameters:

filelist – List of paths to tab-delimited .txt files, where first column is a KO identifier.
kopath_values –
{KEGG_Pathway_id:set[KO identifiers]} dict.

For example:
```
{ko12345:set([K12345, K12346,...]),...}
```

path_names –

Dictionary in {KEGG_Pathway_id:Name} format.

For example:

{
    'ko04060': 'Cytokine-cytokine receptor interaction',
    'ko00910': 'Nitrogen metabolism'
}

method – Argument used only as a part of output file name

Output file has following name:

    (method+'_'+filename.replace('txt', 'path_counts.csv'))
where:
    filename = filepath.split('')[-1], if '' in filepath.
    filename = filepath.split('/')[-1],  if '/' in filepath.
    filename = filepath,                 in other cases.

anf following headline:

ko_path_id;ko_path_name;percent common;common KOs

Writes only lines with non-zero common KOs.

pickle_or_db(pickle, db)[source]¶

Reads pickle or SQL database, than makes a dict.

If appropriate pickle (a dict) is available, it is read. In the other case function reads ‘kogenes’ table from SQL database and makes missing pickle. Eventually returns dict.

Parameters:	pickle – Path to pickled dict in following format: `{KEGG GENES identifier : set[KO identifiers]}` db – Cursor object to SQL database with ‘kogenes’ table `(KO identifier KEGG GENES identifier)`
Returns:	Dict in `{KEGG GENES identifier: set[KO identifiers]}` format.

Some information for Bipype’s developers (delete this before final version): Code from this fuction was not a fuction in previous version and ‘args’ was hardcoded to: ‘kogenes.pckl’ & c (variable with db’s cursor)

progress(what, estimated_percentage=None, done=True)[source]¶

Prints specially formatted information about progress.

Parameters:

what – a string with name of operation which was just performed, and should be reported to standard output as don or failed,
estimated_percentage –
(int)

Percent should be calculated as part of whole execution; first and last 5 percent should be reserved for programs which runs ‘metatranscriptomics’, for pre- and postprocessing,
done – informs whether the operation from ‘what’ argument failed or was successfully done.

rapsearch2(input_file, threads)[source]¶

Runs rapsearch2 for input_file in fasta format.

Writes outputs in “m8/” directory.

GLOBALS:

path to RAPSearch2 program: PATH_RAPSEARCH
path to similarity search database: PATH_REF_PROT_KO

run_SARTools()[source]¶

Runs SARTools in R.

HARDCODED:

R templates:

edger: template_script_DESeq2.r
deseq: template_script_edgeR.r

run_cat_pairing()[source]¶: Merges fasta files with paired-end reads in cwd.

run_fastq_to_fasta(fastqs)[source]¶: Runs fastq_to_fasta() for every .fastq in fastqs.

run_ko_csv(ko_dict_deseq, ko_dict_edger, all_conds, kopath_keys, path_names, ref_cond)[source]¶

For given ko_dicts writes CSV files with pathways and foldchanges

Parameters:	ko_dict – `{KO_id:{cond1:value1, cond2:value2...}...}` dict all_conds – list of conditions (list of strings) kopath_keys – `{KO identifier:set[KEGG_Pathway_ids]}` dict path_names – `{KEGG_Pathway_id:Name}` dict filepath – output filepath

Output files have following format (and header)::

KO_id;Gene_name;paths ids;paths names;FC vs cond1;FC vs cond2;...;

HARDCODED:

Output files paths:

deseq: ‘deseq.csv’
edger: ‘edger.csv’

run_ko_map()[source]¶

Runs m8_to_ko() for every .m8 file in cwd.

GLOBALS:

path to KO database: PATH_KO_DB
pickle to dict from KO GENES table from KO database: PATH_KO_PCKL

run_ko_remap(deseq_files, edger_files, kopath_values, path_names)[source]¶

Runs out_content(files, kopath_values, path_names (,'edgeR')) for files from edger_paths and deseq_paths.

Parameters:	deseq_diles – list of DESeq outputs paths edger_files – list of edgeR outputs paths kopath_values – `{KEGG_Pathway_id: set[KO identifiers]}` dict path_names – `{KEGG_Pathway_id: Name}` dict

run_new_ko_remap(deseq_files, edger_files, kopath_values, all_conds, ref_cond)[source]¶

Runs get_ko_fc(), low_change(), mapper() and mapper_write() in appropriate way for files from deseq_files and edger_files.

Parameters:	deseq_diles – list of DESeq outputs paths edger_files – list of edgeR outputs paths ref_cond – Reference condition (group) - string kopath_values – `{KEGG_Pathway_id:set[KO identifiers]}` dict all_conds – list of conditions (list of strings)
Returns:	`{KO_id:{cond1:value1, cond2:value2...}...}` dict ko_dict_edger: `{KO_id:{cond1:value1, cond2:value2...}...}` dict
Return type:	ko_dict_deseq

HARDCODED:

Output directories paths:

deseq: ‘new_ko_remap/deseq/’
edger: ‘new_ko_remap/edger/’

run_pre_ko_remap()[source]¶

Prepares args for run_ko_remap() or run_new_ko_remap()

Returns:	`{KEGG_Pathway_id:Name}` dict kopath_keys: `{KO identifier:set[KEGG_Pathway_ids]}` dict kopath_values: `{KEGG_Pathway_id:set[KO identifiers]}` dict edger_files: list of edgeR outputs paths deseq_diles: list of DESeq outputs paths
Return type:	path_names

HARDCODED:

Paths to files from SARTools:

edger: ‘edger/*[pn].txt’
deseq: ‘deseq/*[pn].txt’

GLOBALS:

path to KO database: PATH_KO_DB

run_rapsearch(threads)[source]¶: Runs rapsearch2() for every .tmp.fasta in cwd.