# Formal Definition of Sequence Alignment

Consider an alphabet $Σ$ and two sequences $s$ and $t$ on $Σ$. Let $Σˉ=Σ∪{⋄}$ where $⋄$ is some symbol not in $Σ$, it’s called the gap symbol. $M:Σˉ×Σˉ→Z$. We have a gap penalties functions $g_{s},g_{t}:N→Z$. The functions are monotonic and are $0$ at $0$. The four boundary gap penalty coefficient $b_{s},b_{t},e_{s},e_{t}∈{0,1}$. We want to find the alignment score between the two sequences.

A gap is the maximal substring consist of only $⋄$’s. The gap sequence of a string is a sequence of lengths of each gap. Define $s_{⋄}$ be the gap sequence of string $s$.

$G(x,g,y,a)=xg(a_{1})+∑_{i=2}g(a_{i})+yg(a_{n})$, where $a$ is a sequence of length $n$.

$A(u,v)=i=1∑∣u∣ M(u_{i},v_{i})+G(b_{s},g_{s},e_{s},u_{⋄})+G(b_{t},g_{t},e_{t},v_{⋄})$

Define $S$ and $T$ be the set of all strings that can be formed by inserting $⋄$ into $s$ and $t$ respectively.

The alignment score is defined as $max{A(u,v):∣u∣=∣v∣,u∈S,v∈T}$

Now, once one write an algorithm for this problem, it can be used for many sequence alignment problems on Rosalind.

- Hamming Distance: $M(a,a)=−1$, and $0$ otherwise. All other function are $−∞$.
- Finding a Shared Spliced Motif, longest common subsequence. $M(a,a)=1$ for $a∈Σ$, $0$ otherwise. Everything else are $0$.
- Edit distance: Levenshtein distance, $M(a,a)=−1$ for $a∈Σˉ$, $0$ otherwise. All other are $−∞$.
- Global alignment: $b_{s}=e_{s}=g_{s}$, $b_{t}=e_{t}=g_{t}$.
- Overlap alignment: $b_{s}=e_{t}=0$, $b_{t}=g_{t}$, $e_{s}=g_{s}$.
- Fitting alignment: $b_{s}=e_{s}=0$, $b_{t}=e_{t}=g_{t}$.
- Semiglobal alignment: $b_{s}=e_{s}=b_{t}=e_{t}=0$.
- Substring Matching: $b_{s}=e_{s}=0$, $g_{s},g_{t},b_{t},e_{t}=−∞$.