Customize Grape for your own datasets¶

Choosing a project name¶

Choose your project name wisely. Some examples:

CLL

ENCODE

HBM

Let’s say you have a project about Drosophila and you are interested in Selenoproteins, you choose the project name ‘Dsel’. We are going to use the project name ‘MyProject’ here.

Layout¶

There are a number of top level folders containing configuration files, like accessions, and profiles. We’ll come back to these later, but right now we will create a custom folder for our project inside the pipelines folder:

$ cd grape
$ cd pipelines
$ mkdir MyProject
$ cd MyProject

Configure the buildout.cfg¶

Write a buildout.cfg file for your project:

[buildout]
extends = ../dependencies.cfg
          ../../accessions/MyProject/db.cfg
          ../../profiles/MyProject/db.cfg

Adding a configuration file for the accessions¶

Add a project folder to the top level accessions folder, and add a db.cfg file:

cd grape cd accessions mkdir MyProject cd MyProject touch db.cfg

We’ll cover how to configure this file after taking care of the profile.

Adding a configuration file for the profile¶

Add a project folder to the top level profiles folder, and add a db.cfg file:

cd grape
cd profiles
mkdir MyProject
cd MyProject
touch db.cfg

Configuring the profile¶

Let’s copy over the configuration for the profile of the Test project:

cd grape
cd profiles
cd MyProject
cp ../Test/db.cfg .

We adapt this file to our case:

[runs]
parts = Male
        Female

[pipeline]
TEMPLATE   = ${buildout:directory}/src/pipeline/template3.0.txt
PROJECTID  = MyProject
DB         = MyProject_RNAseqPipeline
COMMONDB   = MyProject_RNAseqPipelineCommon
HOST       = pou
THREADS    = 2
MAPPER     = GEM
MISMATCHES = 2
CLUSTER    = mem_6
ANNOTATION = /users/yourusername/Drosophilas/Dwill/dwil_all_r1.3.gff
GENOMESEQ  = /users/yourusername/Genomes/Drosophila_willistoni/genome.fa

[Male]
recipe = grape.recipe.pipeline
accession = Male

[Female]
recipe = grape.recipe.pipeline
accession = Female

What we have done in this configuration is:

Decide how to call the pipeline runs: Male and Female

Configured the Databases in which to store the results: MyProject_RNAseqPipeline and MyProject_RNAseqPipelineCommon

Given the location of the annotation and the genome

Configured the pipelines to be run on the cluster with 2 threads

Configuring the accessions¶

Let’s copy over the configuration for the profile of the MyProject project:

cd grape
cd accessions
cd MyProject

Edit the db.cfg file we created earlier:

[Female]
file_location = /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read1_qseq.fastq
                /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read2_qseq.fastq
mate_id = Female.1
          Female.2
pair_id = Female
          Female
label = Female
        Female
gender = female
dataType=RNASeq
cell=CELL
rnaExtract=UNKNOWN
localization=CELL
replicate=1
lab=CRG
type=fastq
readType=2x96
qualities=phred
species=Drosophila willistoni

[Male]
file_location = /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read1_qseq.fastq
                /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read2_qseq.fastq
mate_id = Male.1
          Male.2
pair_id = Male
          Male
label = Male
        Male
gender = male
dataType=RNASeq
cell=CELL
rnaExtract=UNKNOWN
localization=CELL
replicate=1
lab=CRG
type=fastq
readType=2x96
qualities=phred
species=Drosophila willistoni

Now you have the two accessions defined and the profiles specify how to run the two pipelines. Now we need a database for storing the results of the pipeline runs.

Create databases for your project¶

You need two databases for the MyProject project:

MyProject_RNAseqPipeline

MyProject_RNAseqPipelineCommon

The permissions you need to ask for are:

rnaseqweb: read

yourusername: read and write

The rnaseqweb user needs read access in order to show the statistical results.

You needs to have read write access.

Then you need to modify your MySQL configuration file: ~/.my.cnf:

[client]
host=mysqlserver
port=3306
user=yourusername
password=123

Run the buildout¶

Run virtualenv:

cd grape
cd pipelines
cd MyProject
virtualenv --no-site-packages .

If you get an error, you may have to remove your .pydistutils.cfg file.

.pydistutils.cfg

Run the bootstrap.py file with the python binary that has been made available by virtualenv in the bin folder:

cd grape
cd pipelines
cd MyProject
./bin/python ../../bootstrap.py

Run the buildout:

cd grape
cd pipelines
cd MyProject
./bin/buildout

The parts folder now contains everything you need to run the two pipelines:

cd grape
cd pipelines
cd MyProject
cd parts/
tree
.
|-- Female
|   |-- GEMIndices -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/GEMIndices
|   |-- bin -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/bin
|   |-- clean.sh
|   |-- execute.sh
|   |-- lib -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/lib
|   |-- read.list.txt
|   |-- readData
|   |   |-- lane8_W_female_read1_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read1_qseq.fastq
|   |   `-- lane8_W_female_read2_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read2_qseq.fastq
|   |-- results -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/Female
|   `-- start.sh
|-- Male
|   |-- GEMIndices -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/GEMIndices
|   |-- bin -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/bin
|   |-- clean.sh
|   |-- execute.sh
|   |-- lib -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/lib
|   |-- read.list.txt
|   |-- readData
|   |   |-- lane8_W_male_read1_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read1_qseq.fastq
|   |   `-- lane8_W_male_read2_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read2_qseq.fastq
|   |-- results -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/Male
|   `-- start.sh
`-- buildout

Run the first pipeline¶

Now it is time to run the first pipeline so that the index files for the genome and annotation can be generated. Once these files are present we can run all the other pipelines in parallel.

Go to the parts folder and run the start script:

cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Female
./start.sh

If you get errors, you can store them into an error.log file like this:

cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Female
./start.sh 2> error.log

In case everything worked ok, you can run the execute script:

cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Female
./execute.sh

Run the other pipeline¶

The second pipeline is run exactly like the first one:

Go to the parts folder and run the start script:

cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Male
./start.sh

If you get errors, you can store them into an error.log file like this:

cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Male
./start.sh 2> error.log

In case everything worked ok, you can run the execute script:

cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Male
./execute.sh

Customize Grape for your own datasets¶

Choosing a project name¶

Layout¶

Configure the buildout.cfg¶

Adding a configuration file for the accessions¶

Adding a configuration file for the profile¶

Configuring the profile¶

Configuring the accessions¶

Create databases for your project¶

Run the buildout¶

Run the first pipeline¶

Run the other pipeline¶

Table Of Contents

Previous topic

Next topic

This Page

Feedback

Navigation

Customize Grape for your own datasets¶

Choosing a project name¶

Layout¶

Configure the buildout.cfg¶

Adding a configuration file for the accessions¶

Adding a configuration file for the profile¶

Configuring the profile¶

Configuring the accessions¶

Create databases for your project¶

Run the buildout¶

Run the first pipeline¶

Run the other pipeline¶

Table Of Contents

Previous topic

Next topic

This Page

Feedback

Quick search

Navigation