Formal Definition of Sequence Alignment
Consider an alphabet and two sequences and on . Let where is some symbol not in , it’s called the gap symbol. . We have a gap penalties functions . The functions are monotonic and are at . The four boundary gap penalty coefficient . We want to find the alignment score between the two sequences.
A gap is the maximal substring consist of only ’s. The gap sequence of a string is a sequence of lengths of each gap. Define be the gap sequence of string .
, where is a sequence of length .
Define and be the set of all strings that can be formed by inserting into and respectively.
The alignment score is defined as
Now, once one write an algorithm for this problem, it can be used for many sequence alignment problems on Rosalind.
- Hamming Distance: , and otherwise. All other function are .
- Finding a Shared Spliced Motif, longest common subsequence. for , otherwise. Everything else are .
- Edit distance: Levenshtein distance, for , otherwise. All other are .
- Global alignment: , .
- Overlap alignment: , , .
- Fitting alignment: , .
- Semiglobal alignment: .
- Substring Matching: , .