### Download reference data for test datasets ``` wget https://singlecelltestsets.s3.amazonaws.com/refdata.tar.gz tar -xvf refdata.tar.gz ``` we recommend starting from a blank slate with a fresh conda install for each app. We'll have 3 independent conda installs at the end of this guide, each with their corresponding single cell pipeline app. ### Align #### setup conda environment create a separate directory for alignment: ``` mkdir ALIGN && cd ALIGN ``` and install miniconda: ``` wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3 ``` install single cell pipeline (alignment) app under the miniconda install: ``` export PATH=$PWD/miniconda3/bin:$PATH conda update -n base -c defaults conda -y conda install -c bioconda -c shahcompbio single_cell_pipeline_align ``` #### download test dataset: ``` wget https://singlecelltestsets.s3.amazonaws.com/alignment.tar.gz tar -xvf alignment.tar.gz ``` #### generate inputs.yaml file: ``` SA1090-A96213A-R20-C28: column: 28 condition: B fastqs: HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5: fastq_1: testdata/SA1090-A96213A-R20-C28_1.fastq.gz fastq_2: testdata/SA1090-A96213A-R20-C28_2.fastq.gz sequencing_center: TEST sequencing_instrument: ILLUMINA img_col: 45 index_i5: i5-20 index_i7: i7-28 pick_met: C1 primer_i5: GTATAG primer_i7: CTATCT row: 20 SA1090-A96213A-R20-C62: column: 62 condition: B fastqs: HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5: fastq_1: testdata/SA1090-A96213A-R20-C62_1.fastq.gz fastq_2: testdata/SA1090-A96213A-R20-C62_2.fastq.gz sequencing_center: TEST sequencing_instrument: ILLUMINA img_col: 11 index_i5: i5-20 index_i7: i7-62 pick_met: C1 primer_i5: GTATAG primer_i7: AAGCTA row: 20 SA1090-A96213A-R22-C43: column: 43 condition: B fastqs: HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5: fastq_1: testdata/SA1090-A96213A-R22-C43_1.fastq.gz fastq_2: testdata/SA1090-A96213A-R22-C43_2.fastq.gz sequencing_center: TEST sequencing_instrument: ILLUMINA img_col: 30 index_i5: i5-22 index_i7: i7-43 pick_met: C2 primer_i5: GCTGTA primer_i7: ATTCCG row: 22 ``` the testdata path must change to point it to the correct output data directory. #### launch the pipeline: ``` export PATH=$PWD/miniconda3/bin:$PATH single_cell alignment --input_yaml inputs.yaml \ --library_id A1234A --maxjobs 4 --nocleanup --sentinel_only \ --submit local --loglevel DEBUG \ --tmpdir temp --pipelinedir pipeline \ --out_dir output --bams_dir bams \ --config_override '{"refdir": "refdata"}' ``` ### Hmmcopy #### setup conda environment create a separate directory for hmmcopy: ``` mkdir HMMCOPY && cd HMMCOPY ``` and install miniconda: ``` wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3 ``` install single cell pipeline (hmmcopy) app under the miniconda install: ``` export PATH=$PWD/miniconda3/bin:$PATH conda update -n base -c defaults conda -y conda install -c bioconda -c shahcompbio single_cell_pipeline_hmmcopy ``` #### download test dataset: ``` wget https://singlecelltestsets.s3.amazonaws.com/hmmcopy.tar.gz tar -xvf hmmcopy.tar.gz ``` #### generate inputs.yaml file: ``` SA1090-A96213A-R20-C28: bam: testdata/SA1090-A96213A-R20-C28.bam column: 28 condition: B img_col: 45 index_i5: i5-20 index_i7: i7-28 pick_met: C1 primer_i5: GTATAG primer_i7: CTATCT row: 20 SA1090-A96213A-R20-C62: bam: testdata/SA1090-A96213A-R20-C62.bam column: 62 condition: B img_col: 11 index_i5: i5-20 index_i7: i7-62 pick_met: C1 primer_i5: GTATAG primer_i7: AAGCTA row: 20 SA1090-A96213A-R22-C43: bam: testdata/SA1090-A96213A-R22-C43.bam column: 43 condition: B img_col: 30 index_i5: i5-22 index_i7: i7-43 pick_met: C2 primer_i5: GCTGTA primer_i7: ATTCCG row: 22 ``` the testdata path must change to point it to the correct output data directory. #### launch the pipeline: ``` export PATH=$PWD/miniconda3/bin:$PATH single_cell hmmcopy \ --input_yaml inputs.yaml \ --library_id A1234A --maxjobs 4 --nocleanup \ --sentinel_only --submit local --loglevel DEBUG \ --tmpdir temp --pipelinedir pipeline --out_dir output \ --config_override '{"refdir": "refdata", "hmmcopy": {"chromosomes": ["6", "8", "17"]}}' ``` ### Annotation #### setup conda environment create a separate directory for annotation: ``` mkdir ANNOTATION && cd ANNOTATION ``` and install miniconda: ``` wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3 ``` install single cell pipeline (annotation) app under the miniconda install: ``` export PATH=$PWD/miniconda3/bin:$PATH conda update -n base -c defaults conda -y conda install -c shahcompbio single_cell_pipeline_annotation ``` #### download test dataset: ``` wget https://singlecelltestsets.s3.amazonaws.com/annotation.tar.gz tar -xvf annotation.tar.gz ``` #### generate inputs.yaml file: ``` hmmcopy_metrics: testdata/A96213A_hmmcopy_metrics.csv.gz hmmcopy_reads: testdata/A96213A_reads.csv.gz alignment_metrics: testdata/A96213A_alignment_metrics.csv.gz gc_metrics: testdata/A96213A_gc_metrics.csv.gz segs_pdf_tar: testdata/A96213A_segs.tar.gz ``` #### launch the pipeline: ``` export PATH=$PWD/miniconda3/bin:$PATH single_cell annotation \ --input_yaml inputs.yaml \ --library_id A1234A --maxjobs 4 --nocleanup \ --sentinel_only --submit local --loglevel DEBUG \ --tmpdir temp --pipelinedir pipeline --out_dir output \ --config_override '{"refdir": "refdata", "annotation": {"chromosomes": ["6", "8", "17"]}}' ``` ### Switching to production runs: #### Reference data Before you switch over to production and start running the real datasets, please download the full reference dataset and replace the test dataset from step 1. ``` wget https://singlecelltestsets.s3.amazonaws.com/refdata_full_genome.tar.gz tar -xvf refdata_full_genome.tar.gz ``` #### Config override update the config overrides to run the pipeline over the full genome for Hmmcopy and Annotation, the config override in the launch section should be: ``` --config_override '{"refdir": "refdata"}' ``` #### Run with HPC batch submit systems ##### nativespec We need to figure out the nativespec first. This is a string that specifies the format for job submission. for instance, the following LSF (for juno cluster at MSKCC) job request ``` bsub -n 1 -W 4:00 -R "rusage[mem=5]span[ptile=1]select[type==CentOS7]" ``` will ask for 1 core, 5 gigs and a runtime of 4 hours the corresponding pipeline nativespec is ``` -n {ncpus} -W {walltime} -R "rusage[mem={mem}]span[ptile={ncpus}]select[type==CentOS7]" ``` ##### launch arguments please add the following arguments to the launch command ###### Juno cluster (LSF): ``` --submit lsf --nativespec ' -n {ncpus} -W {walltime} -R "rusage[mem={mem}]span[ptile={ncpus}]select[type==CentOS7]"' ``` the pipeline supports *SGE* with `--submit asyncqsub` and *LSF* with `--submit lsf`