###  Download reference data for test datasets
```
wget https://singlecelltestsets.s3.amazonaws.com/refdata.tar.gz
tar -xvf refdata.tar.gz
```

we recommend starting from a blank slate with a fresh conda install for each app. We'll have 3 independent conda installs at the end of this guide, each with their corresponding single cell pipeline app.

### Align

#### setup conda environment

create a separate directory for alignment:
```
mkdir ALIGN && cd ALIGN
```

and install miniconda:
```
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3
```

install single cell pipeline (alignment) app under the miniconda install:
```
export PATH=$PWD/miniconda3/bin:$PATH
conda update -n base -c defaults conda -y   
conda install -c bioconda -c shahcompbio  single_cell_pipeline_align
```

#### download test dataset:
```
wget https://singlecelltestsets.s3.amazonaws.com/alignment.tar.gz
tar -xvf alignment.tar.gz
```

#### generate inputs.yaml file:
```
SA1090-A96213A-R20-C28:
  column: 28
  condition: B
  fastqs:
    HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5:
      fastq_1: testdata/SA1090-A96213A-R20-C28_1.fastq.gz
      fastq_2: testdata/SA1090-A96213A-R20-C28_2.fastq.gz
      sequencing_center: TEST
      sequencing_instrument: ILLUMINA
  img_col: 45
  index_i5: i5-20
  index_i7: i7-28
  pick_met: C1
  primer_i5: GTATAG
  primer_i7: CTATCT
  row: 20
SA1090-A96213A-R20-C62:
  column: 62
  condition: B
  fastqs:
    HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5:
      fastq_1: testdata/SA1090-A96213A-R20-C62_1.fastq.gz
      fastq_2: testdata/SA1090-A96213A-R20-C62_2.fastq.gz
      sequencing_center: TEST
      sequencing_instrument: ILLUMINA
  img_col: 11
  index_i5: i5-20
  index_i7: i7-62
  pick_met: C1
  primer_i5: GTATAG
  primer_i7: AAGCTA
  row: 20
SA1090-A96213A-R22-C43:
  column: 43
  condition: B
  fastqs:
    HHCJ7CCXY_5.HGTJJCCXY_8.HYG5LCCXY_6.HYG5LCCXY_7.HYG5LCCXY_5:
      fastq_1: testdata/SA1090-A96213A-R22-C43_1.fastq.gz
      fastq_2: testdata/SA1090-A96213A-R22-C43_2.fastq.gz
      sequencing_center: TEST
      sequencing_instrument: ILLUMINA
  img_col: 30
  index_i5: i5-22
  index_i7: i7-43
  pick_met: C2
  primer_i5: GCTGTA
  primer_i7: ATTCCG
  row: 22
```

the testdata path must change to point it to the correct output data directory.

#### launch the pipeline:

```
export PATH=$PWD/miniconda3/bin:$PATH
single_cell alignment --input_yaml inputs.yaml \
--library_id A1234A --maxjobs 4 --nocleanup --sentinel_only \
--submit local --loglevel DEBUG \
--tmpdir temp --pipelinedir pipeline \
--out_dir output --bams_dir bams \
--config_override '{"refdir": "refdata"}'
```

### Hmmcopy

#### setup conda environment

create a separate directory for hmmcopy:
```
mkdir HMMCOPY && cd HMMCOPY
```

and install miniconda:
```
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3
```

install single cell pipeline (hmmcopy) app under the miniconda install:
```
export PATH=$PWD/miniconda3/bin:$PATH
conda update -n base -c defaults conda -y   
conda install -c bioconda -c shahcompbio single_cell_pipeline_hmmcopy 
```

#### download test dataset:
```
wget https://singlecelltestsets.s3.amazonaws.com/hmmcopy.tar.gz
tar -xvf hmmcopy.tar.gz
```

#### generate inputs.yaml file:
```
SA1090-A96213A-R20-C28:
  bam: testdata/SA1090-A96213A-R20-C28.bam
  column: 28
  condition: B
  img_col: 45
  index_i5: i5-20
  index_i7: i7-28
  pick_met: C1
  primer_i5: GTATAG
  primer_i7: CTATCT
  row: 20
SA1090-A96213A-R20-C62:
  bam: testdata/SA1090-A96213A-R20-C62.bam
  column: 62
  condition: B
  img_col: 11
  index_i5: i5-20
  index_i7: i7-62
  pick_met: C1
  primer_i5: GTATAG
  primer_i7: AAGCTA
  row: 20
SA1090-A96213A-R22-C43:
  bam: testdata/SA1090-A96213A-R22-C43.bam
  column: 43
  condition: B
  img_col: 30
  index_i5: i5-22
  index_i7: i7-43
  pick_met: C2
  primer_i5: GCTGTA
  primer_i7: ATTCCG
  row: 22
  ```
  the testdata path must change to point it to the correct output data directory.

#### launch the pipeline:

```
export PATH=$PWD/miniconda3/bin:$PATH
 single_cell hmmcopy \
 --input_yaml inputs.yaml \
 --library_id A1234A --maxjobs 4 --nocleanup \
 --sentinel_only --submit local --loglevel DEBUG \
 --tmpdir temp --pipelinedir pipeline --out_dir output \
 --config_override '{"refdir": "refdata", "hmmcopy": {"chromosomes": ["6", "8", "17"]}}'
 ```


### Annotation

#### setup conda environment

create a separate directory for annotation:
```
mkdir ANNOTATION && cd ANNOTATION
```

and install miniconda:
```
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p $PWD/miniconda3
```

install single cell pipeline (annotation) app under the miniconda install:
```
export PATH=$PWD/miniconda3/bin:$PATH
conda update -n base -c defaults conda -y   
conda install -c shahcompbio single_cell_pipeline_annotation 
```

#### download test dataset:
```
wget https://singlecelltestsets.s3.amazonaws.com/annotation.tar.gz
tar -xvf annotation.tar.gz
```

#### generate inputs.yaml file:
```
hmmcopy_metrics: testdata/A96213A_hmmcopy_metrics.csv.gz
hmmcopy_reads: testdata/A96213A_reads.csv.gz
alignment_metrics: testdata/A96213A_alignment_metrics.csv.gz
gc_metrics: testdata/A96213A_gc_metrics.csv.gz
segs_pdf_tar: testdata/A96213A_segs.tar.gz
```

#### launch the pipeline:

```
export PATH=$PWD/miniconda3/bin:$PATH
 single_cell annotation \
 --input_yaml inputs.yaml \
 --library_id A1234A --maxjobs 4 --nocleanup \
 --sentinel_only --submit local --loglevel DEBUG \
 --tmpdir temp --pipelinedir pipeline --out_dir output \
 --config_override '{"refdir": "refdata", "annotation": {"chromosomes": ["6", "8", "17"]}}'
 ```


### Switching to production runs:

#### Reference data
Before you switch over to production and start running the real datasets, please download the full reference dataset and replace the test dataset from step 1.

```
wget https://singlecelltestsets.s3.amazonaws.com/refdata_full_genome.tar.gz
tar -xvf refdata_full_genome.tar.gz
```

#### Config override
update the config overrides to run the pipeline over the full genome

for Hmmcopy and Annotation, the config override in the launch section should be:
 ```
--config_override '{"refdir": "refdata"}'
```

#### Run with HPC batch submit systems

##### nativespec
We need to figure out the nativespec first. This is a string that specifies the format for job submission.

for instance, the following LSF (for juno cluster at MSKCC) job request
```
bsub -n 1 -W 4:00 -R "rusage[mem=5]span[ptile=1]select[type==CentOS7]"
```

will ask for 1 core, 5 gigs and a runtime of 4 hours

the corresponding pipeline nativespec is
```
-n {ncpus} -W {walltime} -R "rusage[mem={mem}]span[ptile={ncpus}]select[type==CentOS7]"
```


##### launch arguments
please add the following arguments to the launch command

###### Juno cluster (LSF):
```
--submit lsf --nativespec ' -n {ncpus} -W {walltime} -R "rusage[mem={mem}]span[ptile={ncpus}]select[type==CentOS7]"'
```

the pipeline supports *SGE* with `--submit asyncqsub` and *LSF* with `--submit lsf`