Table of contents:
- Introduction
- Screendumps
- Pipeline, Example 1
- Pipeline, Example 2
- Pipeline, Example 3
- News/History
- Sources and other links
1) Introduction
Here comes some tools to convert GS20 or FLX assemblies (454Contigs.ace) into STADEN format
so that these are correct viewable/editable/... whithin the staden package (gap4):
You have then
- all furhter programs are open source
- full graphical overview about assembly
- exact aligned trace and there positions, base values etc and
- respects quality clipping informations
- access to "hidden data"
- shows associated flowgramm traces SFF format (tested with staden-1-7-0)
Description, Goals - please take a look at this Poster.pdf.
2) Screendumps:
2.1) Assembly with enabling "show cutoffs" assembly_with_cutoffs.gif
2.2) Assembly with enabling "show differences by dots" gap4_by_dots.gif
2.3) Trace view trace_example.gif
.... TOP?
|
3) Pipeline example 1:
1) Create traces with runPhoenix and create assembly with runAssembly and/or runMapping.
This results into one or more EIKxxx.sff traces and one 454Contig.ace.
At best you create a new directory with this files (copy or link this).
454Contigs.ace
EIK12345a.sff
EIK12345b.sff
2) Create fasta and quality files from each traces:
bash oneliner e.g:
% for f in $(ls *.sff); do F=${f%%.sff}; sff_dump -f $F.fna -q $F.qual $f; done
So, you have now:
454Contigs.ace
EIK12345a.sff
EIK12345b.sff
EIK12345a.fna <---
EIK12345a.qual <---
EIK12345b.fna <---
EIK12345b.qual <---
...
3) Check or modify line with regulare expression 'glob(E*.fna)' and 'glob(E*.qual) whithin roche454ace2caf.pl
and also check the correct path to your external executable 'align_to_scf' via RPATH shell variable.
4) Convert assembly from ACE into CAF format:
% roche454ace2caf.pl -i 1 -c 5 >454.caf 2>454.err
( -i 1 = enable partial trace names like EIXXXX_to1010 and EIXXXX_fm1212 )
( -c 5 = dont convert traces short 5 bases (only seen in older roche454 versions) )
Also available:
( -h HELP )
( -a add/duplicate contig as additional trace because STADEN generates own consensus quality values)
( -f xx = read contig from different ace file)
( -q xx = read quality values from different sources )
5) Create GAP4 database from CAF file:
% caf2gap -project 454 -version 0 < 454.caf >x.out 2>x.err
6) Optionally for speed up displaying traces in gap4 you should convert sff into hashed sff:
% for f in $(ls *[A-Z0-9].sff); do F=${f%%.sff}; hash_sff -o ${F}_hash.sff $f ; done
7) Create gap4sff like:
#!/bin/bash
TRACE_PATH=$(ls sff/*_hash.sff | sed 's/^/:HASH=/' | tr -d '\012')
export TRACE_PATH
if [ -f /opt/staden/staden.profile ]
then
. /opt/staden/staden.profile
exec /opt/staden/${MACHINE}-bin/gap4 ${@+"$@"}
else
echo "Can't find any suitable staden environment"
fi
and finnaly run it:
% ./gap4sff 454.0
|
.... TOP?
4) Pipeline example 2:
In case of using foreign or remote services for assemblies often only the quality file 454Contigs.qual is known;
Than you can substitute step 4 with:
% roche454ace2caf.pl -q 454Contigs.qual -f 454Contigs.ace >454.caf 2>454.err
.... TOP?
5) Pipeline example 3:
You can also run all-in-one-utility roche2gap script:
% roche2gap -d gap4 -p HKI -v 1
News / History
roche454ace2gap.pl - V1.10 (08.12.2010):
- Add option -a == enable/disable adding (or duplicating) contig as additional read.
Feature was statically introduced in version 1.09.
align_to_scf - V1.06 (04.09.2009):
- some bugs reported with large sequences or rare compiler bug with old binaries and/or optimizing problem;
So recompiling solved the problem. very obscure or to much.
No other changes.
align_to_scf - V1.05 (29.01.2009):
- bug fixed with large line length
align_to_scf - V1.04 (24.12.2008):
- bug fixed in case of large stretched sequences filled by too many dashes
sff_dump - V1.04 (11.11.2008):
- Bug fixed under rare compiler optimization options (gcc -O2 ...)
- Speed up; runs now 30% faster
roche454ace2gap.pl - V1.09 (21.10.2008):
- Fixed bug: calculated negative starting position on some few traces under special cases.
- AddOn: Original contig sequence is added to assembly as single read because STADEN
calculates his own and sometime different consensus
- Old 'illegal' traces are now named 'partial' traces in help text
roche454ace2gap.pl - V1.08a (20.05.2008):
Due to some positive feedback from James Bonfield (jb) from sanger the executable 'align_to_scf' (v1.02)
runs so much faster (from hours down to seconds) that no more parallel variant and
subroutines are needed and therefore removed from code;
roche454ace2gap.pl - V1.08 (14.04.2008):
Due to some changes from roche(R) offInstrumente program 'runAssembler'
following changes are incorporated:
There are many variations were quality clipping informations was hide:
- no clipping information (QL==1, QR == length of good clipped sequence)
- clipping information at trim line ( e.g. TRIM=5-123 )
- clipping information now at QA line (QL=2, QR=123)
The time consumpting align_to_scf' runs via scripts run_align2scf.sh and splits first 454contigs.qry into
several 454contigs.qrx and calls MAKE for running align_to_scf (make -j8 ) parallel;
At end all results are pasted back into 454contigs.aln.
.... TOP?
6) Please visit Download & Sources:
-
Mostly your will using this conversation tools on Linux (at least x86 architecture), so I have provided now a tar archive.
You should download latest version roche454ace2gap-2010-12-08.tgz !
Decompress and unpack it with GNU tar:
% mkdir -p /usr/local/roche2gap
% cd /usr/local/
% gtar -xvf roche454ace2gap-2010-12-08.tgz
It contains binaries, scripts and sources and should extracted into /usr/local/roche2gap/.
If you set RPATH you can use any other local directory for the following programs:
- bin/sff_dump
- bin/align_to_scf
- bin/roche454ace2caf.pl
- bin/caf2gap
- bin/ace_contig_coverage.pl
- bin/ace2caf_minimal.pl
- bin/ace_split.pl
- bin/acestatus.pl
What else did you get ?
- Other perl utilities:
- ace2caf_minimal.pl - Converts ace into caf without any trace/quality information. So very fast.
- ace_split.pl - Splits one big ace files into different contig ace files with one contig only.
- ace_contig_coverage.pl - Generates thumbnail grafik (png) about coverage of each contigs; required external gnuplot
-
- ace_status.pl - some statistics (padded,unpadded) about ace file
- External links for convenience:
.... TOP?
|
|