user manual Gap.pl


Introduction
============

[...]


Standard Options
================

switch argument types:
 B:=boolean
 F:=floating point/scientific
 N:=integer
 S:=string
 X:=varying type

 -SlcCnum=N    select for minimum clone number in contig/scaffold
 -SlcID=S      effect only in -SlcData=clone or -SlcData=read
 -SlcLen=N     select contigs according to minimum length
 -SlcLen=N1..N2
               select contigs according to length range (N1:=minimum,
               N2:=maximum).
 -SlcRnum=N    select for minimum reading number in contig/scaffold


Gap.pl -index
=============

This program mode allows to extract the basic topological information from
an assembly (currently a GAP4 database). The program generates two table files,
one table lists one reading per line, with its name, conting position,
visible length, orientation etc. The second table contains entries of contigs
with attributes like total length, number of readings etc. Both these tables
together describe the assembly sufficiently, such that they can be used as
input to downstream analysis programs like, e.g. `Gap.pl -ScaffdMap ...'.

If you intend to use the tables as input to your custom secondary programs,
use the column labels rather than the column positions to refer to the data.
This will produce most stable results with forthcoming Gap.pl versions
that may feature additional columns, or a modified order of columns.

COMMAND LINE SYNTAX
 Gap.pl -index <gapdb.v>

arguments:
 gapdb.v       GAP4 database: (folder/)database.version

options:
 -OutStump=S   path stump for multi-file output. A default is derived from
               local time and phrase of program mode.
 -SlcCnum=N    select for minimum clone number in contig/scaffold
 -SlcData=S    case-insensitive data specifier:
               contig  index reads in contigs and contigs in projects
                       (default).
               scaffd  perform scaffold analysis, index reads in scaffolds
                       and scaffolds in projects. This is kind of a slim
                       version of program mode `Gap.pl -ScaffdIndex ...'.
 -SlcLen=N     select contigs according to minimum length
 -SlcLen=N1..N2
               select contigs according to length range (N1:=minimum,
               N2:=maximum).
 -SlcRnum=N    select for minimum reading number in contig/scaffold

EXAMPLES

 Gap.pl -index .project/mygapdb.0

   Analyse assembly in GAP4 database ".project/mygapdb.0" and create table files
   .project/200505261509_indexRead.tab  and
   .project/200505261509_indexContig.tab

 Gap.pl -index -SlcCnum=2 -OutStump=~/analyse/Idx .project/mygapdb.0

   From assembly in GAP4 database ".project/mygapdb.0" select contigs which
   are made up from at least two independent sequencing templates. Create table
   files
   ~/analyse/IdxRead.tab  and
   ~/analyse/IdxContig.tab


Gap.pl -repair
==============

see German description in manual_Gap_GER.txt.


Gap.pl -QualAdjust
==================

This program mode helps to adjust the quality (confidence) values of bases
in an assembly. The program actually targets at experiment files that have
been exported from a GAP4 database, i.e. files that are part of a directed
assembly or pre-assembled data. The program discerns two kinds of sequence
sites to allow adjustments of their quality values.
- sequence sites that have been edited, identified by lower case letters
- all sites in sequences that are not tied to trace data. Probably, these
  sequences originate from extern sources rather than from sequencing
  experiments.
The program prints a summary of changes that have been made.

COMMAND LINE SYNTAX
 Gap.pl -QualAdjust[=qed[,qext]] [exp1 [exp2 ...]]

program mode arguments, optional:
 qed           maximum quality value for edited symbols, default: 50
 qext          quality value for extern sequences, default: 2

arguments, optional:
 exp           experiment file, default: files listed in ./fofn

EXAMPLES

 exportdirected.tcl .project/mygapdb.0 .project/my_export
 cd .project/my_export
 Gap.pl -QualAdjust
 cd -
 assembledirected.tcl .project/mygapdb2.0 .project/my_export/fofn

   # Export directed assembly from GAP4 database ".project/mygapdb.0", change to
   # export directory and adjust quality values in all experiment files. Change
   # back to the previous working directory and re-import the directed assembly
   # to a new GAP4 database, ".project/mygapdb2.0".

 exportdirected.tcl .project/mygapdb.0 .project/my_export
 Gap.pl -QualAdjust=10 .project/my_export/mylib*
 assembledirected.tcl .project/mygapdb2.0 .project/my_export/fofn

   # Export directed assembly from GAP4 database ".project/mygapdb.0". Select
   # experiment files that match to "mylib*", and adjust quality values such
   # that all edited sites have a maximum quality value of 10. Re-import the
   # directed assembly to a new GAP4 database, ".project/mygapdb2.0".


Gap.pl -seq
===========

This program mode returns sequences from the GAP4 database to the user.
The output format is Staden Experiment.

COMMAND LINE SYNTAX
 Gap.pl -seq [gap4 [read1 ...]]

program mode arguments, optional:
 gap4          GAP4 database: (folder/)database.version
 read          select contig specifier

options:
 -RcCloneLen   custom clone length rc file, only for -SlcData=scaffd
 -SlcData=S    case-insensitive data specifier:
               clone   clone sequences based on contig consensi, Experiment
                       file format, incl. pads, consensus mode 2.
               contig  index reads in contigs and contigs in projects
                       (default).
               scaffd  perform scaffold analysis, index reads in scaffolds
                       and scaffolds in projects. This is kind of a slim
                       version of program mode `Gap.pl -ScaffdIndex ...'.
               read    sequences of readings. Experiment file format of
                       pre-assembled sub-format.

EXAMPLES

... include piping to SeqHandle.pl, in order to convert the sequence output
to fasta format.

  Gap.pl -seq "gap4" | SeqHandle.pl -
    # sequences of contig consensi (1st read becomes sequence ID)
    # mit Sternchen

  Gap.pl -seq "gap4" | SeqHandle.pl - -pure
    # sequences of contig consensi (1st read becomes sequence ID)
    # ohne Sternchen

  Gap.pl -seq -slcdata=read "gap4" | SeqHandle.pl - -pure
    # sequences of readings (readname becomes sequence ID)
    # mit hidden data
    # mit Sternchen

  Gap.pl -seq -slcdata=read "gap4" | SeqHandle.pl - -clipqual -pure
    # sequences of readings (readname becomes sequence ID)
    # abgeschnittenes hidden data
    # ohne Sternchen

