2021年1月23日星期六

Python Regex Pattern - Genetics

I have the following two string and I'd like to use regex to pull matches that start at AUG and end with either (UAA|UAG|UGA)

string1 = 'AGCCAUGUAGCUAACUCAGGUUACAUGGGGAUGACCCCGCGACUUGGAUUAGAGUCUCUUUUGGAAUAAGCCUGAAUGAUCCGAGUAGCAUCUCAG'  string2 = 'CUGAGAUGCUACUCGGAUCAUUCAGGCUUAUUCCAAAAGAGACUCUAAUCCAAGUCGCGGGGUCAUCCCCAUGUAACCUGAGUUAGCUACAUGGCU'  

The matches I'm looking for are:

'AUGUAG'  'AUGGGGAUGACCCCGCGACUUGGAUUAGAGUCUCUUUUGGAAUAA'  'AUGACCCCGCGACUUGGAUUAGAGUCUCUUUUGGAAUAA'  'AUGCUACUCGGAUCAUUCAGGCUUAUUCCAAAAGAGACUCUAAUCCAAGUCGCGGGGUCAUCCCCAUGUAACCUGAGUUAG'  

I tried the following, pattern but it didn't work. Any explanation why?

pattern = re.compile(r'AUG\w*(UAA|UAG|UGA)')  matches1 = pattern.finditer(string1)  matches2 = pattern.finditer(string2)  

And while I'm at it, I am also curious if one can implement a list ['UAA','UAG','UGA'] into a regex pattern (instead of (UAA|UAG|UGA)) Thanks so much!

https://stackoverflow.com/questions/65866208/python-regex-pattern-genetics January 24, 2021 at 09:05AM

没有评论:

发表评论