Gene-IT Unveils Sequence-Centric Data Integration Approach To Overcome Challenges of Genome-Scale Research

Tuesday, August 9th, 2005

Worcester, Massachusetts – August 9, 2005 – As part of its continuing mission to help end-users turn raw sequence data into actionable scientific knowledge and business intelligence, Gene-IT today unveiled its strategy to tightly couple sequence retrieval features with large-scale sequence comparison for inter- and cross-species mining of up-to-date biological and patent reference data.

One of the basic challenges of Genome-scale research is keeping reference sequence databases up-to-date as new sequences pour in from sequencing labs around the world, as the process requires extensive human effort and the volume of information often exceeds the capacity of existing database technologies. This data-loss can result in wrong interpretation of experimental outcomes or bad decisions on research IP – in turn leading to lost time, money, and erosion of confidence. Gene-IT technology and managed services offerings demonstrate how this can be overcome by allowing previously undiscovered sequence knowledge to be continuously updated and re-indexed ‘on-the-fly’ – a sequence-centric approach to data integration.

Sequence-centric data integration allows the end-user to use the sequence itself to select, sort, and filter previously undiscovered sequence records from mounting stores of sequence reference data – on their own, as soon as new information is available – without the need for database technology. As a result, researchers can simultaneously compare virtually any number of sequences against millions of sequence records from multiple databases to obtain best-fit answers in minutes. In turn, this allows the researcher to apply her own expertise to determine relevance, increasing confidence and participation in the research decision-making process about which sequence targets to fail and which ones to pursue.

Gene-IT offers GenomeQuest, a genomic information managed service and software application that provides solutions for biological and patent investigators. The system provides automated search capability and context sensitive views of sequence information. The components of GenomeQuest are:

  1. GenomeCast - A managed service that continuously aggregates and mirrors the most widely-used biological and patented sequence databases directly to end-user servers, ready to be searched.
  2. GenomeQuest – A software application that couples sequence retrieval with sequence comparison for inter- and cross-species comparison of up-to-date biological and patent reference data. End-users to initiate questions by retrieving sequence records they know using text identifiers, gene names, patent numbers and other keywords and immediately cross-compare with new information contained in and linked to sequence alignments. Results are available in context sensitive “views convenient for end-users, including:

    • A Patent Search view that applies large-scale percent identity Alignment methods for IP investigations to the vast stores of patented and biological sequences and allows Result Analysis based on priority dates, patent assignees, and other business intelligence oriented data linked to sequences and their annotations.
    • A Functional Search view that applies large-scale biological Alignment methods to aid biological understanding based on the evolutionary linkage between two or more sequences. Result Analysis allows selections, sorting, and filtering of huge data sets of alignments and annotations based on species, gene names, protein families and other scientific understanding linked to sequences and their annotations.
    • A Surveillance view that continuously monitors all sequence data sources, and automatically sends e-mail alerts when new information is discovered.

Ron Ranauro, CEO of Gene-IT, said, “The way to solve the sequence data integration challenge is to let users use identifiers of sequences they know and dynamically index thru the sequence itself to discover new information. Past approaches that tried to anticipate biological questions and pre-compute the answer for later retrieval simply did not scale.

He added, “The problem will get worse as sequencing technology improves to produce exponentially larger datasets. We think researchers will want to us that information and we have the automation to help them.