Download reference data for test datasets¶
wget https://singlecelltestsets.s3.amazonaws.com/refdata.tar.gz
tar -xvf refdata.tar.gz
we recommend starting from a blank slate with a fresh conda install for each app. We’ll have 3 independent conda installs at the end of this guide, each with their corresponding single cell pipeline app.
Align¶
setup conda environment¶
create a separate directory for alignment:
mkdir ALIGN && cd ALIGN
and install miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3
install single cell pipeline (alignment) app under the miniconda install:
export PATH=$PWD/miniconda3/bin:$PATH
conda update -n base -c defaults conda -y
conda install -c bioconda -c shahcompbio single_cell_pipeline_align
download test dataset:¶
wget https://singlecelltestsets.s3.amazonaws.com/alignment.tar.gz
tar -xvf alignment.tar.gz
generate inputs.yaml file:¶
SA1090-A96213A-R20-C28:
column: 28
condition: B
fastqs:
HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5:
fastq_1: testdata/SA1090-A96213A-R20-C28_1.fastq.gz
fastq_2: testdata/SA1090-A96213A-R20-C28_2.fastq.gz
sequencing_center: TEST
sequencing_instrument: ILLUMINA
img_col: 45
index_i5: i5-20
index_i7: i7-28
pick_met: C1
primer_i5: GTATAG
primer_i7: CTATCT
row: 20
SA1090-A96213A-R20-C62:
column: 62
condition: B
fastqs:
HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5:
fastq_1: testdata/SA1090-A96213A-R20-C62_1.fastq.gz
fastq_2: testdata/SA1090-A96213A-R20-C62_2.fastq.gz
sequencing_center: TEST
sequencing_instrument: ILLUMINA
img_col: 11
index_i5: i5-20
index_i7: i7-62
pick_met: C1
primer_i5: GTATAG
primer_i7: AAGCTA
row: 20
SA1090-A96213A-R22-C43:
column: 43
condition: B
fastqs:
HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5:
fastq_1: testdata/SA1090-A96213A-R22-C43_1.fastq.gz
fastq_2: testdata/SA1090-A96213A-R22-C43_2.fastq.gz
sequencing_center: TEST
sequencing_instrument: ILLUMINA
img_col: 30
index_i5: i5-22
index_i7: i7-43
pick_met: C2
primer_i5: GCTGTA
primer_i7: ATTCCG
row: 22
the testdata path must change to point it to the correct output data directory.
launch the pipeline:¶
export PATH=$PWD/miniconda3/bin:$PATH
single_cell alignment --input_yaml inputs.yaml \
--library_id A1234A --maxjobs 4 --nocleanup --sentinel_only \
--submit local --loglevel DEBUG \
--tmpdir temp --pipelinedir pipeline \
--out_dir output --bams_dir bams \
--config_override '{"refdir": "refdata"}'
Hmmcopy¶
setup conda environment¶
create a separate directory for hmmcopy:
mkdir HMMCOPY && cd HMMCOPY
and install miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3
install single cell pipeline (hmmcopy) app under the miniconda install:
export PATH=$PWD/miniconda3/bin:$PATH
conda update -n base -c defaults conda -y
conda install -c bioconda -c shahcompbio single_cell_pipeline_hmmcopy
download test dataset:¶
wget https://singlecelltestsets.s3.amazonaws.com/hmmcopy.tar.gz
tar -xvf hmmcopy.tar.gz
generate inputs.yaml file:¶
SA1090-A96213A-R20-C28:
bam: testdata/SA1090-A96213A-R20-C28.bam
column: 28
condition: B
img_col: 45
index_i5: i5-20
index_i7: i7-28
pick_met: C1
primer_i5: GTATAG
primer_i7: CTATCT
row: 20
SA1090-A96213A-R20-C62:
bam: testdata/SA1090-A96213A-R20-C62.bam
column: 62
condition: B
img_col: 11
index_i5: i5-20
index_i7: i7-62
pick_met: C1
primer_i5: GTATAG
primer_i7: AAGCTA
row: 20
SA1090-A96213A-R22-C43:
bam: testdata/SA1090-A96213A-R22-C43.bam
column: 43
condition: B
img_col: 30
index_i5: i5-22
index_i7: i7-43
pick_met: C2
primer_i5: GCTGTA
primer_i7: ATTCCG
row: 22
the testdata path must change to point it to the correct output data directory.
launch the pipeline:¶
export PATH=$PWD/miniconda3/bin:$PATH
single_cell hmmcopy \
--input_yaml inputs.yaml \
--library_id A1234A --maxjobs 4 --nocleanup \
--sentinel_only --submit local --loglevel DEBUG \
--tmpdir temp --pipelinedir pipeline --out_dir output \
--config_override '{"refdir": "refdata", "hmmcopy": {"chromosomes": ["6", "8", "17"]}}'
Annotation¶
setup conda environment¶
create a separate directory for annotation:
mkdir ANNOTATION && cd ANNOTATION
and install miniconda:
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3
install single cell pipeline (annotation) app under the miniconda install:
export PATH=$PWD/miniconda3/bin:$PATH
conda update -n base -c defaults conda -y
conda install -c shahcompbio single_cell_pipeline_annotation
download test dataset:¶
wget https://singlecelltestsets.s3.amazonaws.com/annotation.tar.gz
tar -xvf annotation.tar.gz
generate inputs.yaml file:¶
hmmcopy_metrics: testdata/A96213A_hmmcopy_metrics.csv.gz
hmmcopy_reads: testdata/A96213A_reads.csv.gz
alignment_metrics: testdata/A96213A_alignment_metrics.csv.gz
gc_metrics: testdata/A96213A_gc_metrics.csv.gz
segs_pdf_tar: testdata/A96213A_segs.tar.gz
launch the pipeline:¶
export PATH=$PWD/miniconda3/bin:$PATH
single_cell annotation \
--input_yaml inputs.yaml \
--library_id A1234A --maxjobs 4 --nocleanup \
--sentinel_only --submit local --loglevel DEBUG \
--tmpdir temp --pipelinedir pipeline --out_dir output \
--config_override '{"refdir": "refdata", "annotation": {"chromosomes": ["6", "8", "17"]}}'
Switching to production runs:¶
Reference data¶
Before you switch over to production and start running the real datasets, please download the full reference dataset and replace the test dataset from step 1.
wget https://singlecelltestsets.s3.amazonaws.com/refdata_full_genome.tar.gz
tar -xvf refdata_full_genome.tar.gz
Config override¶
update the config overrides to run the pipeline over the full genome
for Hmmcopy and Annotation, the config override in the launch section should be:
--config_override '{"refdir": "refdata"}'
Run with HPC batch submit systems¶
nativespec¶
We need to figure out the nativespec first. This is a string that specifies the format for job submission.
for instance, the following LSF (for juno cluster at MSKCC) job request
bsub -n 1 -W 4:00 -R "rusage[mem=5]span[ptile=1]select[type==CentOS7]"
will ask for 1 core, 5 gigs and a runtime of 4 hours
the corresponding pipeline nativespec is
-n {ncpus} -W {walltime} -R "rusage[mem={mem}]span[ptile={ncpus}]select[type==CentOS7]"