Install Docker

Instructions for the installation of Docker are available on the Docker website here: https://docs.docker.com/installation/

Using the Docker image

Following commands must be done as root.

Pull the Docker image.

docker pull sigenae/drap

Create the Docker container.

docker create --name drap --privileged -v /mnt/scratch:/docker/scratch -i -t sigenae/drap:latest /bin/bash

Start the Docker container and get bash into the running container.

docker start drap
docker exec -i -t drap /bin/bash

The Docker container is configured to run with 120G of RAM and 6 CPU's.

DRAP install directory in the Docker container is /usr/local/src/drap. Use that path as the INSTALL_FOLDER in the DRAP documentation.

Dependencies

Programming languages interpreters and modules

├── bash
│   └── call
│       └── /bin/bash
├── csh
│   └── call
│       └── /bin/csh
├── perl
│   ├── call
│   │   └── /usr/bin/perl
│   ├── version
│   │   └── 5.*.*
│   └── non standard modules
│       ├── Bio::Search::Hit::GenericHit
│       ├── Bio::Search::Tiling::MapTiling
│       ├── Bio::SearchIO::Writer::HSPTableWriter
│       ├── Bio::SeqIO
│       ├── Bio::Tools::Run::StandAloneBlast
│       ├── IPC::Run
│       ├── JSON
│       ├── List::Util
│       └── Term::ANSIColor
└── python
    ├── call
    │   └── /usr/bin/env python
    ├── version
    │   └── 2.7
    └── non standard modules
        ├── Bio
        ├── NumPy
        └── SciPy

External softwares

Newer versions of external sofwtares can be used but compatibility is not guaranteed.

Download source

Download source from git

git clone https://forgemia.inra.fr/cedric.cabau/drap.git

DockerFile

Have a look to the DockerFile could ease the installation.

drap.cfg

This configuration file, common to all modules, is divided into several sections:

[SCHEDULER]
type = local  # This value can be 'local', 'sge' or 'slurm'

If your pipeline is executed on a HPC change the scheduler type.

[SOFTWARES] # Softwares called by the workflow
...

Users do not need to modify this section.

[PATH] # Folders added to the PATH
...

If one of the software listed in the section SOFTWARES is not directly accessible, add the bin folder of this software in the section PATH (see 'Check dependencies installation').

[ENV] # Commands to execute to set the environnement (i.e. module load)
# Stages are:
# runDrap       [preprocess oases trinity merge clustering asm post_asm rmbt_editing rmbt_filtering postprocess reference]
# runMeta       [meta_merge meta_longest_orf meta_cluster_orf meta_longest_contig meta_cluster_contig meta_index meta_rmbt meta_filter meta_postprocess meta_reference]
# runAssessment [assemblyMetrics inclusion chimera orf mapping scoring prot busco]
# Syntax is <stageName>_env = command_1; [ command_2; ...]
preprocess_env = module load compiler/gcc-4.9.1 # for Khmer package
busco_env      = module load system/Python-3.6.3 bioinfo/augustus-3.3 system/R-3.4.3 bioinfo/busco-3.0.2
...

If one of the software listed in the section SOFTWARES need a specific environment, add the commands required to set this environment in this section. The example above shows how to load a module named compiler/gcc-4.9.1 needed to run scripts from the Khmer package. Scripts from this package are used to complete the preprocess step (see the DRAP 3rd Party Tools table), so you should give the step name followed by _env and the command to execute at the step start.

[DATABASE] # External data sources
...

Currently, only db needed to perform the TSA validation procedure.

[SOFTWARE CONFIGURATION] # options of softwares called by the workflow
...

[SCHEDULER CONFIGURATION] # HPC configuration (queue, parallel environment)
...

[SCHEDULER RESOURCES] # resources requested for workflow specific task or step
...

Depending on the resources available on your computing infrastructure, adjust the resources requested for some workflow specific task or step.

From a list of fastq (or pair of fastq) files as input, TransRate and snap-aligner try to produce a single output file in bam format. With a lot of reads, snap-aligner is not able to sort the bam file without a very large amount of memory. To reduce the required memory, this patch will run snap-aligner with each fastq or pair of fastq from the list and delegate merging and sorting to samtools. Please be sure to have the samtools command in your PATH because the availability of this new dependency is not checked by the TransRate patched version.

cd TRANSRATE_INSTALL_DIR/lib/app/lib/transrate
patch -b < INSTALL_DIR/plugins/drap-transrate/transrate.patch

Check dependencies installation

runCheck is a software designed to check dependencies of a workflow.

INSTALL_FOLDER/runCheck --cfg-file INSTALL_FOLDER/cfg/drap.cfg

Check workflows execution

Run the workflows on demonstration data: quick start.