Professional Documents
Culture Documents
Version: 2.2
Developer: Yu-Wei Wu (ywwei@lbl.gov)
Affiliation: Joint BioEnergy Institute, Lawrence Berkeley National Laboratory
---Plotting the number of each marker gene in each contig--MaxBin provides the functionality to plot the single copy marker genes in each b
in using R (with gplots package installed). Please install R beforehand and make
sure that R and Rscript can be executed by type the two commands in the command
line.
=== Run MaxBin ===
To run MaxBin, please type in "perl run_MaxBin.pl" or just "run_MaxBin.pl". You
will see options. Here are the options:
(required) -contig (contig filename)
(required) -out (output file header)
--- at least one of the following parameters is needed
(semi-required) -abund (contig abundance files. To be explained in Abundance ses
sion below.)
(semi-required) -reads (reads file in fasta or fastq format. To be explained in
Abundance session below.)
(semi-required) -abund_list (a list file of all contig abundance files.)
(semi-required) -reads_list (a list file of all reads file.)
--(optional) -thread (number of threads; default 1)
(optional) -reassembly (specify this option if you want to reassemble the bins.
Note that at least one reads file needs to be designated.)
(optional) -prob_threshold (minimum probability for EM algorithm; default 0.8)
(optional) -plotmarker (specify this option if you want to plot the markers in e
ach contig. Installing R is a must for this option to work.)
(optional) -verbose (as is. Warning: output log will be LOOOONG.)
(optional) -markerset (choose between 107 marker genes by default or 40 marker g
enes. see Marker Gene Note for more information.)
Example commands:
run_MaxBin.pl -contig mycontig -abund myabund -out myout
run_MaxBin.pl -contig mycontig -reads myreads -reads2 my2ndreads -reads3 my3rdre
ads -out myout
run_MaxBin.pl -contig mycontig -abund myabund -abund2 my2ndabund -abund3 my3rdab
und -reads myreads -reads2 my2ndreads -out myout -thread 4
run_MaxBin.pl -contig mycontig -abund_list abund_list_file -reads_list reads_lis
t_file -out myout -prob_threshold 0.5 -markerset 40
(Warning: Please do NOT specify abundance and reads that BELONG TO THE SAME META
GENOME together. MaxBin will treat them as two different information and thus wi
ll count this metagenome TWICE!)
=== Reassembly Note ===
Reassembly option is still highly experimental. To use this function, you need t
o feed MaxBin "interleaved paired-end" fastq or fasta file if you were to use th
is option. An example of interleaved paired-end fasta file is as follows:
>reads1.1
AAAAA
>reads1.2
CCCCC
>reads2.1
TTTTT
>reads2.2
GGGGG
>reads3.1
AAAAA
>reads3.2
TTTTT
(contig header)\t(abundance)
For example, assume I have three contigs named A0001, A0002, and A0003, then my
abundance file will look like
A0001
A0002
A0003
30.89
20.02
78.93
---if you don't have abundance information--Don't worry, MaxBin will generate this information for you from sequencing reads
. Please specify the reads file (in fasta format) in -reads.
---Providing more than one reads or abundance files
The reads and abundance files can be provided via a file consisting of all reads
or abundance files. For example, you have 3 abundance files and 5 reads files,
and you want to provide them all. You can create two files "abund_list" and "rea
ds_list", which are
$ cat abund_list
(first abundance file)
(second abundance file)
(third abundance file)
$ cat reads_list
(first reads file)
(second reads file)
(third reads file)
(fourth reads file)
(fifth reads file)
Then you can feed all information into MaxBin using -abund_list and -reads_list
options. Handy for a large number of reads or abundance files.
===MaxBin Output===
Assume your output file header is (out). MaxBin will generate information using
this file header as follows.
(out).0XX.fasta -- the XX bin. XX are numbers, e.g. out.001.fasta
(out).summary -- a summary file describing which contigs are being classified in
to which bin.
(out).log -- a log file recording the core steps of MaxBin algorithm
(out).marker -- marker gene presence numbers for each bin. This table is ready t
o be plotted by R or other 3rd-party software.
(out).marker.pdf -- visualization of the marker gene presence numbers using R. W
ill only appear if -plotmarker is specified.
(out).noclass -- this file stores all sequences that pass the minimum length thr
eshold but are not classified successfully.
(out).tooshort -- this file stores all sequences that do not meet the minimum le
ngth threshold.
(out).marker_of_each_gene.tar.gz -- this tarball file stores all markers predict
ed from the individual bins. Use "tar -zxvf (out).marker_of_each_gene.tar.gz" to
extract the markers [(out).0XX.marker.fasta].
(if -reassembly is given)
(out)_reassem/(out).reads.0xx -- the collected reads for the 0xx bin.
(out)_reassem/(out).reads.noclass - reads that cannot be assigned to any bin.
===Bug or problems===
Encounter bugs, problems, or have any suggestions? Please contact Yu-Wei Wu (yww
ei@lbl.gov).