bin/mcmusec =============================== Usage: mcmusec -f [-q ] -s -g -d -k [-o ] [-n ] [-v] -v: output the number of gene clusters per size in console [false] -s: minimal COG number in gene clusters [2] -d: unified maxgap for all genomes [1000] -g: maxgap in base pair (bp) or gene insertions (gi) [bp] -k: minimal number of genomes where the gene clusters occur [2] -f: the file with a list of input genomes' ptt file names (or filename maxgap, if -d is not used), one per line -q: query the list of gene clusters (cog names seperated by space), one per line, optional -o: output file, optional, [output.mc] -n: simple output file, optional, [output.mcs] Example: 1. bin/mcmusec -f data/Francesca_Science06_133_genomes.path.txt -g gi -d 2 -k 3 -o result/Francesca_Science06_133_genomes/output_gi_d2_k3.mc -n result/Francesca_Science06_133_genomes/output_gi_d2_k3.mcs will output all max-gap clusters in the genomes listed at data/Francesca_Science06_133_genomes.path.txt, with minsize=2, maxgap=2gi, minsupp=3, to file result/Francesca_Science06_133_genomes/output_gi_d2_k3.mc. The file result/Francesca_Science06_133_genomes/output_gi_d2_k3.mcs contains the same results but in a :-deliminated-column format with summarized information only. 2. bin/mcmusec -f data/Francesca_Science06_133_genomes.path.txt -q data/RegulonDB_operon.cog -g gi -d 2 -o result/RegulonDB_operon/operon_gi_d2.mc -n result/RegulonDB_operon/operon_gi_d2.mcs will scan the known operons' COGs at data/RegulonDB_operon.cog, and output all occurrences in the genomes listed at data/Francesca_Science06_133_genomes.path.txt, with maxgap=2gi, to file result/RegulonDB_operon/operon_gi_d2.mc. The file result/RegulonDB_operon/operon_gi_d2.mcs contains the same results but in a :-deliminated-column format with summarized information only. bin/bls ================================= Usage: bls Exmaple: bin/bls data/tree_Feb15_midpoint_rooted.txt data/Francesca_Science06_133_genomes.info.txt result/RegulonDB_operon/operon_gi_d2.mcs result/RegulonDB_operon/operon_gi_d2.bls will compute bls for the clusters in result/RegulonDB_operon/operon_gi_d2.mcs, using the phylogeny at data/tree_Feb15_midpoint_rooted.txt and the mapping of tax ID (column 1) to NCBI's sequence accession ID (column 2) at data/Francesca_Science06_133_genomes.info.txt, output the bls as an additional column to the input cluster file to file result/RegulonDB_operon/operon_gi_d2.bls bin/randomsample ================================== Usage: randomesample -f [ptt path file] -s [cluter size] -g [bp|gi] -d [maxgap] -r [number of samples] Example: bin/randomsample -f data/Francesca_Science06_133_genomes.path.txt -s result/RegulonDB_operon/operon_gi_d2.gcs -g gi -d 2 -r 1000 will randomly sample 1000 clusters for each genome-size pair listed at result/RegulonDB_operon/operon_gi_d2.gcs, output to defaut file: NC_#.rdm where NC_# is the NCBI's sequence accession ID, for each input genome. bin/getClusterSizes.pl =================================== Usage: getClusterSizes.pl > output Example: bin/getClusterSizes.pl result/RegulonDB_operon/operon_gi_d2.mcs > result/RegulonDB_operon/operon_gi_d2.gcs will scan the result file result/RegulonDB_operon/operon_gi_d2.mcs to compute the obtained gene clusters' size for each genome, output to file result/RegulonDB_operon/operon_gi_d2.gcs Note: This output will be used later to randomly sample clusters for each genome-size pair. bin/cmptBLSpval.pl =================================== Usage: cmptBLSpval.pl > output Example: bin/cmptBLSpval.pl result/RegulonDB_operon/operon_gi_d2.bls result/RegulonDB_operon/rdm_gi_d2_bls > result/RegulonDB_operon/operon_gi_d2.pval will compute the p-value for obtained clusters at result/RegulonDB_operon/operon_gi_d2.bls based on the randomly sampled clusters' bls distribution files in the folder result/RegulonDB_operon/rdm_gi_d2_bls, output the result p-value as an additional column to the input cluster bls file to file result/RegulonDB_operon/operon_gi_d2.pval