ScaleHD - Automated Huntington Disease genotyping pipeline

ScaleHD is a bioinformatics pipeline for use in broad-scope automated genotyping of parallel sequencing data of the HTT CAG/CCG repeat which is associated with Huntington Disease (HD) data. Existing software which aim to profile disease-causing nucleotide repeat loci (such as the HD-causing HTT CAG repeat) typically function on a generalised level, not focusing on any one particular disease-causing locus. In the case of the HTT CAG repeat, this can result in inaccuracies occurring in any genotypes produced, due to a plethora of biological phenomena which are present, complicating matters when it comes to genotyping the HTT CAG repeat. This software takes a direct approach to resolving these problems, and doing so in an unsupervised, automated manner.

ScaleHD takes a configuration XML document as input, which contains all required information for the instance of ScaleHD to run to completion. The details of this XML input are specified in Specifying User Input(s). Raw data should be in FastQ format; both forward (_R1) and reverse (_R2) sequence reads are required for ScaleHD to function. ScaleHD will perform quality control, sequence alignment and genotyping on all FastQ file pairs presented by the user as input. If a sample fails to produce a genotype at any given stage, for whatever reason, a debug log is created so the user can (hopefully) understand why.

Currently, we are on Version 1.0. While the base algorithm has been implemented, much improvement still remains. As such, there will be continual development of ScaleHD for the foreseeable future. For more information on version changes, please check Developer Documentation.

As of April 2020, I am leaving the university for a new challenge. ScaleHD has been updated to Python 3.7 for future support; feel free to fork and maintain the spaghetti as you desire.

The documentation for this software is organised into the relevant sections.

Info

Development is undertaken at the University of Glasgow, Scotland, and is funded by the CHDI Foundation: http://www.chdifoundation.org/