Version ChangeLog¶

As this software is in continual development, changes will be listed here. Typically a new release will consist of a collection of minor bugfixes, but occasionally we have to release major overhauls to modules, include new features, or release hotfixes because I was hungry and mistakenly pushed an update without thorough enough testing. Don’t write code at lunchtime.

Version 1.0¶

Python 3.7 port
Goodbye!

Version 0.324.2¶

Hotfix for demultiplexing I/O bug that was introduced somehow? (update BatchAdapt if you use demultiplexing)
Hotfix for genotyping zygosity bug where predictions were not correctly overruled by heuristics
Minor bug fix with HTML templates rendering incorrect version strings / missing templates
Hotfix for my dumbass deployment script for updates breaking version strings

Version 0.324¶

Futher minor bugfixes regarding deployment of build scripts incorrectly providing sources in 0.323.

Version 0.323¶

Minor bugfix for SNP calling where data was in an unexpected vector shape
Minor bugfix for HTML report generation where certain exceptions were preventing incorrect data scraping

Version 0.322¶

Removed novel atypical flag indicator from sequences with no intervening sequence at all
Added alignment statistics to default ScaleHD output for samples which aligned, but could not be processed further
Swapped standard HTML summary in genHTML for javascript element (filterable etc)
Wrote brief help section on genHTML output
Fixed some minor genotyping bugs with rare, atypical structures
Fixed un-prompted read count subsampling in samples with atypical allele structures
Fixed MSAViewer alignment showing certain reads off-position by 1 base pair
Improved genHTML handling of failed samples (more information as to why, within detailed view)
Removed choice between SNP calling algorithms; freebayes used exclusively

Version 0.321¶

Fixed some syntax errors with array handling due to a dependency update changing interactions
Added –simple flag for command line interface, providing a more literally-interpretable genotyping outputs
Fixed minor demultiplexing error (path finding)
Added entire HTML5 based output, extracting information from ScaleHD instance objects

Version 0.320¶

Updated dependencies to latest versions (see _sect_reqpack)
Minor tweaks to (syntax) interaction with updated versions of dependencies
Fixed Matplotlib font missing warning spam on certain systems
Fixed SKLearn ConvergenceWarnings spam
Fixed Samtools memory block merging spam

Version 0.318¶

Minor distribution scraping errors for homozygous haplotypes
Logging bugfix with file going missing because i’m bad at my job
SNP Calling masking for ScaleHD-ALSPAC
Framework for simplified 95%C.I. output (feature not implemented in this version; undergoing testing)

Version 0.317¶

Minor genotype graph render bugfixes
Added file I/O of u.x. stdout log for easier troubleshooting
Fixed minor bugs to do with SNP calling I/O paths and me being a bad programmer when hungry
Added sanitisation stage to check for a user attempting to demultiplex files which have already been demultiplexed
Minor tweaks for Windows 10 Linux Subsystem support
Refactoring config backend interpreter to make it less dumpster-fire-awful

Version 0.316¶

Added some minor documentation for SNP Calling (_sect_genotyping)
Heuristic allele filtering engine has been completely rewritten to not be absolute garbage.
Parallelised the DSP module within ScaleHD to execute on multiple contigs of data at once, if enabled.
Parallelisation introduced issue with allele structure incrementing objects would behave improperly – this is now fixed.
Disabled subsampling of aligned assemblies (due to multi-threading speedup; no longer required).
Implemented broad error catching around SNP calling libraries, instead of just exiting upon failure.
Fixed bug with PDF rendering of result distributions utilising an incorrect value for aligned read counts.
Fixed bug where atypical alleles which changed from CCG-homozygous to CCG-heterozygous were not identified.
Fixed error where the heuristic filtering engine suspects an expanded allele, but ended up calling a homozygous haplotype.
Casting issue where two alleles returned different dimension-shaped arrays for FOD genotype calling, was resolved.

Version 0.314/5¶

Fixed homozygous haplotype casting error

Fixed diminished alleles being skipped (or not flagged) in particular cases of read drop-off in homozygous expansions

Version 0.313¶

Fixed a rare error wherein graphs would not be rendered where an atypical allele rewrote the CCG-zygosity from heterozygous to homozygous.

Added a flag for when the two core genotyping algorithms cannot agree on the status of one allele; this manifests as an expanded allele being missed due to significantly low read count.

Allele sorting algorithm has been tweaked to correct some mistakes in my garbage code.

Fixed rare error where FastQC would be executed on incorrect data.

Fixed certain genotyping flags being applied on a sample wide basis as opposed to an individual allele basis.

Version 0.312¶

Added an additional (optional) pre-processing stage, including sequence demultiplexing via Batchadapt.

CCG First order differential bugfix in situations where peak-calling returned multiple variables when unexpected.

Added Batchadapt to the required python package list for ScaleHD. Installed automatically from PIP where possible.

Version 0.311¶

Moron hotfix for dumb reverse aggregate distribution bug I introduced with v0.310

Version 0.310¶

This is a minor update to ScaleHD. SNP calling implementation is now in alpha.

Fixed a bug where genotyping would complete, but raise an exception at the end of the genotyping module, due to particular arrays not being flattened.

Implemented Picard/GATK/Freebayes into the SNP calling module of ScaleHD.

Added PyVCF as a Python library requirement for scraping data from variant calls.

Modified the requirements for Picard/GATK to be integrated with ScaleHD on the user’s system $PATH.

Added Freebayes to the list of required binaries in __backend; addition user $PATH check

Added new XML flag for user to specify a strictness value, for determining legitimate SNP calls.

Minor codebase re-arranging in preparation for Digital Signal Processing to be replaced by a c++ binary, for performance.

Version 0.300¶

We now consider version 0.300 a “release-candidate alpha”, if such a thing exists. I.E. The functionality performs as desired, 99% of the time (figure not accurate and i am not legally liable for any repercussions of assuming ScaleHD is 99% accurate haHAa). From this point onwards, new releases will contain new features, or a large collection of bug fixes. Minor iterations are (hopefully) over.

Removed Rpy2 and R-interface codebase in preparation for switching bayesian confirmation model to a native python library.

Added additional flag for ScaleHD output, describing how many reads that mapped to multiple references were removed (if enabled by the user).

Switched output rendering pipeline from Prettyplotlib to Seaborn (PPL is no longer supported).

Minor backend modifications in relation to the above.

SKLearn deprecation on label encoder fixes

Minor genotyping fixes (thresholds)

Version 0.252¶

Modified the N-Aligned distribution logic to utilise pre-smoothing data distribution as opposed to post-smoothing.

Bugfix with label in (a)typical allele being assigned an estimated CAG attribute which was not an integer.

FastQ subsampling workflow modified to remove possibility of incorrect percentages applying to genotyping confidence.

Fixed the algorithm which calculates Somatic Mosaicism for each allele (i.e. no longer reading from incorrect attributes).

Some other stuff that I forgot.

Version 0.251¶

Removed the redundant workflow codebase for Assembly processing (i.e. using BAM as input; feature not required/desired anymore).

Refactored the input method that the user can specify to subsample input reads, or not.

Scope fix for instances that do not use SeqQC.

Alternative shell pathing check for requisite binaries fix (e.g. using zsh instead of bash)

Version 0.250¶

CCG distribution cleanup threshold tweaks

Added handler for atypical-typical 50:50 read ratio assembly contigs.

Added a threshold context manager for Neighbouring Allele Peak algorithm.

Added differential confusion flag for samples which ScaleHD cannot sort via heuristics.

Begun to implement Polymorphism detection..