Organism Filter¶

The pipeline uses FastqScreen to classify and filter non human reads.

The QC pipeline runs fastq screen on each single cell fastq pair. Fastq screen takes fastq inputs and outputs fastqs with tags added to read names. Each read in a pair is classified independently. We run our classification against human, mouse and salmon genomes. The bam files generated by the pipeline will be tagged with the fastqscreen tag to specify the species that they belong to.

| Fastq Screen Flag| Explanation| |—-|—-| |0|Read does not map| |1|Read maps uniquely| |2|Read multi maps|

Fastq format¶

Flag Format: The Flag information is appended to the read id in the fastq file. The very first read will have the following format:

The Flag information is appended to the read id in the fastq file. The very first read will have the following format:

@<Read-id>#FQST:grch37:mm10:salmon:100

In this example, the read uniquely maps to the human genome and doesn’t align to Mouse or Salmon genome at all.

All subsequent reads will have the following format:

@<Read-id>#FQST:100

Bam format¶

Each read in the bam file will contain the following tag:

FS:Z:mm10_0,salmon_0,grch37_1

Pipeline features:¶

Metrics:¶

Detailed Metrics:¶

The pipeline generates a csv file with detailed counts for every flag option. The counts are also split by the Read direction. The table columns depend on the references that we’re checking against. For instance, the table will have following columns for a run against Human, Mouse and Salmon genomes:

cell_id: id of the cell
read_end: end 1 or 2 of read pairs
Human: The column will have values {0,1,2}. Please see the table in fastq screen for details
Mouse: The column will have values {0,1,2}. Please see the table in fastq screen for details
Salmon: The column will have values {0,1,2}. Please see the table in fastq screen for details
count: number of reads

Summary Metrics:¶

The pipeline will also add some summary metrics to the main alignment metrics table. The column names depend on the references. For instance, the table will have following columns for a run against Human, Mouse and Salmon genomes

human: count of reads that align to human genome (uniquely or multi-map)
human_multihit: count of reads that align to human genome (uniquely or multi-map) and also align to another genome at the same time (uniquely or multi-map)
mouse: count of reads that align to mouse genome (uniquely or multi-map)
mouse_multihit: count of reads that align to mouse genome (uniquely or multi-map) and also align to another genome at the same time (uniquely or multi-map)
salmon: count of reads that align to salmon genome (uniquely or multi-map)
salmon_multihit: count of reads that align to salmon genome (uniquely or multi-map) and also align to another genome at the same time (uniquely or multi-map)
nohit: count of reads that do not align to any genome

Options¶

Default functionality:¶

do not filter the files at all. The output bam files will have the information in their read tags.

Organism Filter¶

Fastq format¶

Bam format¶

Pipeline features:¶

Metrics:¶

Detailed Metrics:¶

Summary Metrics:¶

Options¶

Default functionality:¶

single_cell_pipeline

Navigation

Related Topics

Organism Filter¶

Fastq format¶

Bam format¶

Pipeline features:¶

Metrics:¶

Detailed Metrics:¶

Summary Metrics:¶

Options¶

Default functionality:¶

Filter options:¶