Choose your project name wisely. Some examples:
- CLL
- ENCODE
- HBM
Let’s say you have a project about Drosophila and you are interested in Selenoproteins, you choose the project name ‘Dsel’. We are going to use the project name ‘MyProject’ here.
There are a number of top level folders containing configuration files, like accessions, and profiles. We’ll come back to these later, but right now we will create a custom folder for our project inside the pipelines folder:
$ cd grape
$ cd pipelines
$ mkdir MyProject
$ cd MyProject
Write a buildout.cfg file for your project:
[buildout]
extends = ../dependencies.cfg
../../accessions/MyProject/db.cfg
../../profiles/MyProject/db.cfg
Add a project folder to the top level accessions folder, and add a db.cfg file:
cd grape cd accessions mkdir MyProject cd MyProject touch db.cfg
We’ll cover how to configure this file after taking care of the profile.
Add a project folder to the top level profiles folder, and add a db.cfg file:
cd grape
cd profiles
mkdir MyProject
cd MyProject
touch db.cfg
Let’s copy over the configuration for the profile of the Test project:
cd grape
cd profiles
cd MyProject
cp ../Test/db.cfg .
We adapt this file to our case:
[runs]
parts = Male
Female
[pipeline]
TEMPLATE = ${buildout:directory}/src/pipeline/template3.0.txt
PROJECTID = MyProject
DB = MyProject_RNAseqPipeline
COMMONDB = MyProject_RNAseqPipelineCommon
HOST = pou
THREADS = 2
MAPPER = GEM
MISMATCHES = 2
CLUSTER = mem_6
ANNOTATION = /users/yourusername/Drosophilas/Dwill/dwil_all_r1.3.gff
GENOMESEQ = /users/yourusername/Genomes/Drosophila_willistoni/genome.fa
[Male]
recipe = grape.recipe.pipeline
accession = Male
[Female]
recipe = grape.recipe.pipeline
accession = Female
What we have done in this configuration is:
- Decide how to call the pipeline runs: Male and Female
- Configured the Databases in which to store the results: MyProject_RNAseqPipeline and MyProject_RNAseqPipelineCommon
- Given the location of the annotation and the genome
- Configured the pipelines to be run on the cluster with 2 threads
Let’s copy over the configuration for the profile of the MyProject project:
cd grape
cd accessions
cd MyProject
Edit the db.cfg file we created earlier:
[Female]
file_location = /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read1_qseq.fastq
/users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read2_qseq.fastq
mate_id = Female.1
Female.2
pair_id = Female
Female
label = Female
Female
gender = female
dataType=RNASeq
cell=CELL
rnaExtract=UNKNOWN
localization=CELL
replicate=1
lab=CRG
type=fastq
readType=2x96
qualities=phred
species=Drosophila willistoni
[Male]
file_location = /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read1_qseq.fastq
/users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read2_qseq.fastq
mate_id = Male.1
Male.2
pair_id = Male
Male
label = Male
Male
gender = male
dataType=RNASeq
cell=CELL
rnaExtract=UNKNOWN
localization=CELL
replicate=1
lab=CRG
type=fastq
readType=2x96
qualities=phred
species=Drosophila willistoni
Now you have the two accessions defined and the profiles specify how to run the two pipelines. Now we need a database for storing the results of the pipeline runs.
You need two databases for the MyProject project:
- MyProject_RNAseqPipeline
- MyProject_RNAseqPipelineCommon
The permissions you need to ask for are:
- rnaseqweb: read
- yourusername: read and write
The rnaseqweb user needs read access in order to show the statistical results.
You needs to have read write access.
Then you need to modify your MySQL configuration file: ~/.my.cnf:
[client]
host=mysqlserver
port=3306
user=yourusername
password=123
Run virtualenv:
cd grape
cd pipelines
cd MyProject
virtualenv --no-site-packages .
If you get an error, you may have to remove your .pydistutils.cfg file.
.pydistutils.cfg
Run the bootstrap.py file with the python binary that has been made available by virtualenv in the bin folder:
cd grape
cd pipelines
cd MyProject
./bin/python ../../bootstrap.py
Run the buildout:
cd grape
cd pipelines
cd MyProject
./bin/buildout
The parts folder now contains everything you need to run the two pipelines:
cd grape
cd pipelines
cd MyProject
cd parts/
tree
.
|-- Female
| |-- GEMIndices -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/GEMIndices
| |-- bin -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/bin
| |-- clean.sh
| |-- execute.sh
| |-- lib -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/lib
| |-- read.list.txt
| |-- readData
| | |-- lane8_W_female_read1_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read1_qseq.fastq
| | `-- lane8_W_female_read2_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read2_qseq.fastq
| |-- results -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/Female
| `-- start.sh
|-- Male
| |-- GEMIndices -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/GEMIndices
| |-- bin -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/bin
| |-- clean.sh
| |-- execute.sh
| |-- lib -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/lib
| |-- read.list.txt
| |-- readData
| | |-- lane8_W_male_read1_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read1_qseq.fastq
| | `-- lane8_W_male_read2_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read2_qseq.fastq
| |-- results -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/Male
| `-- start.sh
`-- buildout
Now it is time to run the first pipeline so that the index files for the genome and annotation can be generated. Once these files are present we can run all the other pipelines in parallel.
Go to the parts folder and run the start script:
cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Female
./start.sh
If you get errors, you can store them into an error.log file like this:
cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Female
./start.sh 2> error.log
In case everything worked ok, you can run the execute script:
cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Female
./execute.sh
The second pipeline is run exactly like the first one:
Go to the parts folder and run the start script:
cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Male
./start.sh
If you get errors, you can store them into an error.log file like this:
cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Male
./start.sh 2> error.log
In case everything worked ok, you can run the execute script:
cd grape
cd pipelines
cd MyProject
cd parts/
cd parts/Male
./execute.sh