ResourcesResources for Gene Patent Searches

Frequently Asked Questions

Explore the GenomeQuest solution further including details on the GQ platform and some frequently asked questions from next-generation customers.
Download PDF»

Case Study: Next-generation quality assurance for a next-generation sequence service facility

Read More»

Case study: GenomeQuest platform provides access to annotated next-generation sequence data for immediate analysis by researchers

Read More»




GenomeQuest's High-Speed Sequence Search Suite (HS3) Algorithms

HS3 is GenomeQuest's arsenal of powerful sequence search methods and is the entry point for any next-generation sequencing data file. The suite allows combinations of local and global alignment methods to be deployed in tandem to increase the yield of useful sequence data from next-generation sequencing runs. HS3 enables users to take high performance computing for granted and focus instead on the biology of the question at hand.

The centerpiece of HS3 is a high-speed, word-based algorithm able to identify highly similar sequences quickly. The algorithm has no read length limitation, allows user definable word-lengths and mismatch stringencies and is able to deal with gaps and sequence reads from any sequencing vendor platform. The scalability and ultra high throughput of HS3 makes it perfectly suited for high volume, “all-against-all” sequence comparisons such as:

 

How fast is HS3?

We recently completed a project involving a metagenomics-scale comparison of the output from several next-gen sequence runs against a Refseq Genome collection. For one specific run we have the following details to provide a better sense of the speed of the HS3 algorithm.

The Sequences:

The query set was comprised of 375,661 sequences that represented a total of about 103.4M base pairs. The subject set contained about 19M sequences of about 66B base pairs.

The Hardware:

This sequence comparison run was conducted on a small cluster that contains 7 compute nodes and a head node. The compute nodes were all identical, each with dual-quad core cpus, running at 2.33GHz, for a total of 56 cores. Each compute node has 16Gb of RAM and 500Gb of local storage.

The head node is also a dual-quad core system, running at 2.66GHz, with 64GB of RAM and 4Tb of local storage.

The Results:
The comparison of the query set to the subject set represented 7.14T (7.14x1012) comparisons in a total compute time (wall clock time) of 8 hours and 5 minutes. This represents 883B (8.83x1011) comparisons per hour or 15.8B (1.58x1010) comparisons per core per hour.

 

Request More Information