.. _custom_pipeline_run: Customize Grape for your own datasets ===================================== Choosing a project name ----------------------- Choose your project name wisely. Some examples: * CLL * ENCODE * HBM Let's say you have a project about Drosophila and you are interested in Selenoproteins, you choose the project name 'Dsel'. We are going to use the project name 'MyProject' here. Layout ------ There are a number of top level folders containing configuration files, like accessions, and profiles. We'll come back to these later, but right now we will create a custom folder for our project inside the pipelines folder:: $ cd grape $ cd pipelines $ mkdir MyProject $ cd MyProject Configure the buildout.cfg -------------------------- Write a buildout.cfg file for your project:: [buildout] extends = ../dependencies.cfg ../../accessions/MyProject/db.cfg ../../profiles/MyProject/db.cfg Adding a configuration file for the accessions ---------------------------------------------- Add a project folder to the top level accessions folder, and add a db.cfg file: cd grape cd accessions mkdir MyProject cd MyProject touch db.cfg We'll cover how to configure this file after taking care of the profile. Adding a configuration file for the profile ------------------------------------------- Add a project folder to the top level profiles folder, and add a db.cfg file:: cd grape cd profiles mkdir MyProject cd MyProject touch db.cfg Configuring the profile ----------------------- Let's copy over the configuration for the profile of the Test project:: cd grape cd profiles cd MyProject cp ../Test/db.cfg . We adapt this file to our case:: [runs] parts = Male Female [pipeline] TEMPLATE = ${buildout:directory}/src/pipeline/template3.0.txt PROJECTID = MyProject DB = MyProject_RNAseqPipeline COMMONDB = MyProject_RNAseqPipelineCommon HOST = pou THREADS = 2 MAPPER = GEM MISMATCHES = 2 CLUSTER = mem_6 ANNOTATION = /users/yourusername/Drosophilas/Dwill/dwil_all_r1.3.gff GENOMESEQ = /users/yourusername/Genomes/Drosophila_willistoni/genome.fa [Male] recipe = grape.recipe.pipeline accession = Male [Female] recipe = grape.recipe.pipeline accession = Female What we have done in this configuration is: 1. Decide how to call the pipeline runs: Male and Female 2. Configured the Databases in which to store the results: MyProject_RNAseqPipeline and MyProject_RNAseqPipelineCommon 3. Given the location of the annotation and the genome 4. Configured the pipelines to be run on the cluster with 2 threads Configuring the accessions -------------------------- Let's copy over the configuration for the profile of the MyProject project:: cd grape cd accessions cd MyProject Edit the db.cfg file we created earlier:: [Female] file_location = /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read1_qseq.fastq /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read2_qseq.fastq mate_id = Female.1 Female.2 pair_id = Female Female label = Female Female gender = female dataType=RNASeq cell=CELL rnaExtract=UNKNOWN localization=CELL replicate=1 lab=CRG type=fastq readType=2x96 qualities=phred species=Drosophila willistoni [Male] file_location = /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read1_qseq.fastq /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read2_qseq.fastq mate_id = Male.1 Male.2 pair_id = Male Male label = Male Male gender = male dataType=RNASeq cell=CELL rnaExtract=UNKNOWN localization=CELL replicate=1 lab=CRG type=fastq readType=2x96 qualities=phred species=Drosophila willistoni Now you have the two accessions defined and the profiles specify how to run the two pipelines. Now we need a database for storing the results of the pipeline runs. Create databases for your project --------------------------------- You need two databases for the MyProject project: 1. MyProject_RNAseqPipeline 2. MyProject_RNAseqPipelineCommon The permissions you need to ask for are: 1. rnaseqweb: read 2. yourusername: read and write The rnaseqweb user needs read access in order to show the statistical results. You needs to have read write access. Then you need to modify your MySQL configuration file: ~/.my.cnf:: [client] host=mysqlserver port=3306 user=yourusername password=123 Run the buildout ---------------- Run virtualenv:: cd grape cd pipelines cd MyProject virtualenv --no-site-packages . If you get an error, you may have to remove your .pydistutils.cfg file. .pydistutils.cfg Run the bootstrap.py file with the python binary that has been made available by virtualenv in the bin folder:: cd grape cd pipelines cd MyProject ./bin/python ../../bootstrap.py Run the buildout:: cd grape cd pipelines cd MyProject ./bin/buildout The parts folder now contains everything you need to run the two pipelines:: cd grape cd pipelines cd MyProject cd parts/ tree . |-- Female | |-- GEMIndices -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/GEMIndices | |-- bin -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/bin | |-- clean.sh | |-- execute.sh | |-- lib -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/lib | |-- read.list.txt | |-- readData | | |-- lane8_W_female_read1_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read1_qseq.fastq | | `-- lane8_W_female_read2_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_female_read2_qseq.fastq | |-- results -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/Female | `-- start.sh |-- Male | |-- GEMIndices -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/GEMIndices | |-- bin -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/bin | |-- clean.sh | |-- execute.sh | |-- lib -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/pipeline/lib | |-- read.list.txt | |-- readData | | |-- lane8_W_male_read1_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read1_qseq.fastq | | `-- lane8_W_male_read2_qseq.fastq -> /users/myusername/sequencing_drosophilas_saltans/RNAseq/fastq/lane8_W_male_read2_qseq.fastq | |-- results -> /users/yourusername/Drosophilas/Dwill/Pipeline/pipelines/MyProject/var/Male | `-- start.sh `-- buildout Run the first pipeline ---------------------- Now it is time to run the first pipeline so that the index files for the genome and annotation can be generated. Once these files are present we can run all the other pipelines in parallel. Go to the parts folder and run the start script:: cd grape cd pipelines cd MyProject cd parts/ cd parts/Female ./start.sh If you get errors, you can store them into an error.log file like this:: cd grape cd pipelines cd MyProject cd parts/ cd parts/Female ./start.sh 2> error.log In case everything worked ok, you can run the execute script:: cd grape cd pipelines cd MyProject cd parts/ cd parts/Female ./execute.sh Run the other pipeline ---------------------- The second pipeline is run exactly like the first one: Go to the parts folder and run the start script:: cd grape cd pipelines cd MyProject cd parts/ cd parts/Male ./start.sh If you get errors, you can store them into an error.log file like this:: cd grape cd pipelines cd MyProject cd parts/ cd parts/Male ./start.sh 2> error.log In case everything worked ok, you can run the execute script:: cd grape cd pipelines cd MyProject cd parts/ cd parts/Male ./execute.sh