How do DNA databases work?

1 Answer
Aug 24, 2016

They work just like any other database


DNA databases work just like any other database. That is they store data.

The data on each piece of DNA will be stored as a record, and the record will contain fields that hold the data.

The fields present depend on the specific database. At a minimum though there will be a unique identifier field (this gives a unique name or code for each piece of DNA), and a field containing the DNA sequence. However, there may also be fields containing the name of the organism from which the DNA was taken, and other fields giving information that is relevant such as the date the sample was sequenced, the names of the researchers that sequence data, any associated scientific papers, etc.

The clever bit is how the DNA database is searched.

In most cases, for example when we search Google, we search databases with keywords. That is, we are looking for an exact match to the word (string of letters) that we typed in. However, with a DNA database if we only searched with keywords we would only ever find exact matches, i.e. we would only get back the same piece of DNA from the same organism. We would only match like-for-like.

This "like-for-like" matching would mean that would only ever find a piece of DNA that was an exact match, and we wouldn't find any related sequences from different organisms, or we wouldn't find any related DNA within the same organism (for example, we wouldn't find DNA that encodes for genes that make up a family of proteins).

To get around this "like-for-like" match DNA databases are searched with a "similarity" algorithm. That is, it is not a search for an exact match. Please see the comments below for an example of this (the system wouldn't allow me to type it here).