Repeated DNA Sequences

Desicription

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

1
2
3
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
vector<string> ans;
if(s.size() <= 10)
return ans;
unordered_map<int, int>hashMap;
int hashNum = 0;
for(int i = 0; i < 9; i++)
hashNum = hashNum << 2 | (s[i] - 'A' + 1) % 5;
for(int i = 9; s[i]; i++)
if(hashMap[hashNum = (hashNum << 2 | (s[i] - 'A' + 1) % 5) & 0xfffff]++ == 1)
ans.push_back(s.substr(i-9, 10));
return ans;
}
};