|Software / GenALA|
Table 1. Short description of GenALA tools
Hint to CONSED users: tools are available to swap between GAP4 and CONSED.
NAME genbank2gap transforms a genbank file for import into GAP4 projects USAGE: genbank2gap [OPTIONS] OPTIONS -f GENBANKFILE, --file GENBANKFILE generates a GAP4 readable tag output of genbank features. NOTE: May be bound to an existing GAP4 project (-g) to consider pads. The genbank file should have the same sequence as the GAP4 project or the tags will be placed at wrong positions. - Not meant to update a GAP4 project with new annotations in that case you must use -u. or -n GENBANKFILE, --new GENBANKFILE generates files for a directed assembly into a GAP4 project. Unless -i is given, the identifier of the genbank file will be used as artificial "readname". or -u GENBANKFILE, --update GENBANKFILE Genbank features will replace existing GAP4 tags of the corresponding database reference (same GC2ID). NOTE: Requires -g PROJECT option! - Not meant for import of new db_xref numbers! (The lack of corresponding db_refs in GAP4 will result in loss of new db_xref entries.) - GC2IDs not found in the project will be lost! - Project tags without a corresponding GC2ID in the genbank file will remain untouched. (--not TAG) List tags that shall remain like in the gap project and must not be overwritten by the update. Multiple names must be comma seperated, no blanks Optional switches (-c GENBANK_TAG=GAP_TAG, --convert GENBANK_TAG=GAP_TAG) If there are more or different tags to be included in the process (see below "Converted tags"), they must be listed here. Each one with -c/--convert. The program's settings for this tag will be overwritten! NOTE: Use GENBANK_TAG=undef to turn off a conversion. (--delete_tags MIN_LEN) Delete tags above min len NOTE: This option can be used with -u and it will run gapdeletag for you. (-g PROJECT, --gap_project PROJECT) Name of the target gap project. (-i TEXT, --id TEXT) Contig ID (Name of first read). If multiple place in "" and seperate by blanks.
NAME gap2genbank generates a genbank file out of a GAP4 project OPTIONS -g FILE, --gap_project FILE Name of GAP4 project to generate new the data file(s) from. Ocurring pads will be stripped, or add the -s option keep them. or -f FILE, --file FILE Name of experiment file, if there already is one you wish to use. Make shure that you saved only non-cutoff reading annotations. -a NUM, --accvers NUM Version number of this accession (-c GAP_TAG=GENBANK_TAG, --convert GAP_TAG=GENBANK_TAG) If there are more or different tags to be included in the process (see "Converted tags"), they must be listed here. Each one with --convert. Existing values will be overwritten! NOTE: Use GAP_TAG=undef to turn off a conversion. (-e FILE, --edit FILE) There are entries that can not be found in the GAP4 projects. Name the file containing the data if you have one. Otherwise you will be asked to enter some data (eg.): ORGANISM = Borrelia garinii strain/ssp = PBi ORG_Lineage = Bacteria; Spirochaetes; ...; Borrelia. codon_table = 11 locusID = BGC division = BCT DEFINITION = linear chromosome ACCESSION = AC00000 (or a Name eg IMB_PBil) KEYWORDS = mol_type = genomic DNA One line per entry! Lineage separated by ; from Kingdom -> Species (-h, --help) print this help. (-l NUM, --low_limit NUM) Define a minimum lenght for the contigs to be used in the genbank output. (-q NUM, --qual_cov NUM) Add quality and coverage files. (-r, --readtags) Also add tags from reads into genbankfile. Beware: tags on seperate reads will result in seperate entries! (-s, --strip_no) Don't strip pads. (-w FILENAME, --write FILENAME) Write to file. VERSION: 3.74, DATE: 27.07.2006 DESCRIPTION gap2genbank is designed to generate a genbank file out of an existing GAP4 project. As input it needs either the name of an existing GAP4 project or the name of an existing experiment file. For each contig there will be a seperate genbank entry. The entries will be printed to stdout as a single stream (unless using -w). CONVERTED TAGS * GAP tag => genbank qualifier* * CDS_ => feature CDS tag. * RRNA => feature rRNA tag. * TRNA => feature tRNA tag. * REPT => feature repeat_region tag. The CDS tag will be translated into proteins. If you need more tags in the genbank file, use --convert. The comments in the GAP4 tags must follow the naming convention (one comment per line): * GAP comment in tag => genbank qualifier * >tag => /tag * >tag=comment for tag => /tag="comment for tag" * tag="comment for tag" => /tag="comment for tag" all other comments will be collected in "note" fields: * any text here => /note="any text here" OUTPUT The oputput is the genbankrecord (please redirect to a file). If there are more contigs in the project, every contig gets its own entry but all will be printed in a single stream. They can be seperated at the "end of sequence tag": // A program doing this is available from: mbs at fli-leibniz.de. Program messages will be printed to STDERR. Errors are saved. KNOWN BUGS Tags will not be sorted by position. The program relies on the output of the GAP4 project. Long lines from Genbank files will be broken when imported into GAP4 they are lacking the qualifier tag and will be added as note. EXAMPLE 1) Making a genbank file out of a GAP4 project: gap2genbank -g gap_p.0 > gap_p.gb or gap2genbank -g gap_p.0 -w gap_p.gb 2) Making a genbank file out of an experiment file, there is a saved data file (-e) and you want the ENZ5 tag to be a misc_feature in the genbank file: gap2genbank -f cons.exp -e Org.data -c ENZ5=misc_feature -w gap_p.gb AUTHOR Markus B Schilhabel mail: mbs at fli-leibniz.de
NAME bbgap2genbank v.1.1 writing of different types of sequence files (*.fa / *.gb) of a genome assembly project (GAP4) - reference projects SYNOPSIS bbgap2genbank mandatory: [ -p 'GAP4_project_name.Version' ] [ -b 'common_root_of_all_reference_reading_names' ] optional: [ -h ] print this online help [ -e 'resource_file' ] [ -d 'directory_of_result_files' ] OUTPUT bbgap2genbank writes a collection of different sequence files: 1. consensus sequence of the reference assembly project with reference sequence in case of gaps in target sequence (*_h.fa, *_h.gb) 2. consensus sequence of the reference assembly project with masked reference sequence (N) in case of gaps in target sequence(*_n.fa, *_n.gb) 3. consensus sequence - only target sequence (*_t.fa, *_t.gb) 4. consensus sequence - only reference sequence (*_r.fa, *_r.gb) 5. reference sequence with pads (*_rp.fa) 6. target sequence with pads (*_tp.fa) 7. msf-alignment of reference and target sequence 8. quality files (tab delimited table) of hybrid-, target- and reference-project (*_h.gcc, *_t.gcc ,*_r.gcc) File name explanation <project>_h.fa/_h.gb/_h.gcc (1)+(8) <project>_n.fa/_n.gb (2) <project>_t.fa/_t.gb/_t.gcc (3)+(8) <project>_r.fa/_r.gk/_r.gcc (4)+(8) <project>_tp.fa (5) <project>_rp.fa (6) <project>.msf Alignment of reference and target (7)
NAME gap2annotation v.1.1 concatenates GAP4 consensus sequences for external feature predictions SYNOPSIS gap2annotation mandatory: [ -p 'GAP4 project name' ] optional: [ -h ] this online help [ -c 'lower limit of contig length' ] [ -g 'spacer length between 2 joined contig sequences' ] default value 1000 x n this is necessary because GeneMarkS accepts only FASTA-files with one single sequence with a minimum length of 1 Mb. [ -l 'maximum length of sequence in FASTA format' ] default value : 15 Mb actually no known upper limit for GeneMarkS OUTPUT FASTA-Files test_
NAME annotation2gap v.1.1 parses annotations from a simple tabular format into a GAP4 tag file SYNOPSIS annotation2gap mandatory: [ -p 'GAP4 project name' ] [ -d 'directory of result files' ] [ -f 'space delimited table' ] file from GeneMarkS with CDS-Positions in mailbox: 'GeneMarkS: Gene Listing: <input file>' make 'copy and paste' - it's the easiest way default value: test_pos.txt optional: [ -h ] this online help Example of required table: Gene Strand LeftEnd RightEnd Gene Class # Length 1 + <3 1403 1401 1 2 - 1751 3946 2196 2 3 + 4267 6708 2442 1 OUTPUT tag-file: <project name>.<version>_cds_tags.txt in Experiment-File-Format (q.v. STADEN-Package)
NAME trna2gap v.1.4 produces a tag file with padded positions of tRNA-genes in a GAP4-project (based on a tRNAscan-SE analysis) SYNOPSIS trna2gap mandatory: [ -p 'GAP4 project name.version' ] optional: [ -c 'lower limit of contig length' ] [ -d 'directory of result files' ] [ -h ] this online help OUTPUT tag-file: <project name>.<version>_trs.tags in Experiment-File-Format (q.v. STADEN-Package)
deletes cons or read tags gapdeletag.tcl -g gap_project.v -r [-c or -C NAME] [-l -a NUM -t TAG1,TAG2,... or all tags] -g FILE.V name of gap_project -r delete tags from reads -c delete tags from consensus (all) or -C NAME delete tags from listed consensus (comma seperated, no blanks) (-a NUM delete above contig length (useless for read tags!)) (-t TAGS TAG1,TAG2,TAG3, ... name of tags to be deleted, comma seperated, no blanks) (-l only looking, not doing anything yet) (-v be verbose) -h print a help
saves consensus like GAP4 gapsequence.tcl -g gap_project.v [-c NAME-f [F|X|S] -o [filename] -s [y|n]] -g FILE.V name of gap_project (-c NAME(S) selected contigs in C1,C2,C3 (default: all)) (-f FORMAT X(periment), F(asta), S(taden) (default: F)) (-o FILE output filename (default: cons)) (-s y/n strip pads [y or no] (default: n)) (-h print a help)