Skip to content

Finding a Motif in DNA#18

Merged
danielle-pinto merged 3 commits intomainfrom
2026-02-12-subs
Feb 24, 2026
Merged

Finding a Motif in DNA#18
danielle-pinto merged 3 commits intomainfrom
2026-02-12-subs

Conversation

@danielle-pinto
Copy link
Collaborator

@danielle-pinto danielle-pinto commented Feb 17, 2026

BioJulia solution for the haystack problem https://rosalind.info/problems/subs/

@github-actions
Copy link

Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR18/

but since we want to find all matches,
we will use `findnext`.

Currently, there isn't a `findall` function that allows us to avoid a loop.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this documentation. https://biojulia.dev/BioSequences.jl/v2.0/sequence_search/#Exact-search-1

However, I'm eager to hear if I missed something!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not documented there, but it works - you need to use ExactSearchQuery - https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/BioSequences.jl#L261-L316

julia> findall(ExactSearchQuery(dna"ATCA"), dna"ATCATCA")
2-element Vector{UnitRange{Int64}}:
 1:4
 4:7

julia> findall(ExactSearchQuery(dna"ATCA"), dna"ATCATCA"; overlap=false)
1-element Vector{UnitRange{Int64}}:
 1:4

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I've added this function! Would be great if this function was added to the documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there was documentation at one point in the past https://github.com/BioJulia/BioSequences.jl/pull/236/changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a PR!



```julia
function haystack_findnext(substring, string)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is very similar to the one above. However, I thought it was worth it to keep both solutions since it allows the reader to get introduced to the findnext function.


### Biojulia solution

Lastly, we can leverage some functions in the Kmers Biojulia package to help us!
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of all the packages, I was wondering if the kmers package would potentially have a solution. However, I wasn't able to find a relevant function. Technically, the findnext function is part of BioSequences, so perhaps that is the "BioJulia solution." Please let me know if there's any other julia functions that could be helpful here!

@danielle-pinto danielle-pinto marked this pull request as ready for review February 22, 2026 03:22
@danielle-pinto danielle-pinto merged commit 89a8779 into main Feb 24, 2026
1 check failed
@danielle-pinto danielle-pinto deleted the 2026-02-12-subs branch February 24, 2026 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants