Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
8f25085
initial port to KDTree based HBond map
BradyAJohnston Dec 20, 2025
a3f9ac4
change distance_matrix -> query_ball_tree
BradyAJohnston Dec 22, 2025
617e7de
tweak comments
BradyAJohnston Dec 22, 2025
234ad87
tweak commments and layout
BradyAJohnston Dec 22, 2025
93906df
run black
BradyAJohnston Dec 22, 2025
f578aa9
add name to AUTHORS
BradyAJohnston Dec 22, 2025
d831ba2
add changelog and return comment
BradyAJohnston Dec 22, 2025
86a045d
add test for donor_mask
BradyAJohnston Dec 22, 2025
26c4ca8
run black
BradyAJohnston Dec 22, 2025
2105bd6
change KDTree -> capped_distances
BradyAJohnston Dec 23, 2025
e81d2e2
run black
BradyAJohnston Dec 23, 2025
bdaadab
move changelog entry
BradyAJohnston Dec 23, 2025
f0b2e23
use before distance calculations
BradyAJohnston Dec 29, 2025
4faa47c
warning for init of default box dimensions
BradyAJohnston Dec 29, 2025
4ccfeb6
Merge branch 'develop' into dssp-kdtree
BradyAJohnston Mar 18, 2026
dcf7f89
address PR feedback
BradyAJohnston Mar 20, 2026
ebf118f
black format testsuite
BradyAJohnston Mar 20, 2026
bfd812d
fix is_proline mask (bool instead of idx)
BradyAJohnston Mar 20, 2026
31db5cb
test coverage for box dimensions warning
BradyAJohnston Mar 20, 2026
217575b
Update package/MDAnalysis/analysis/dssp/dssp.py
BradyAJohnston Mar 20, 2026
bb4f60a
Update package/MDAnalysis/analysis/dssp/dssp.py
BradyAJohnston Mar 20, 2026
fe63869
remaining docs comments
BradyAJohnston Mar 20, 2026
1694c1d
asv benchmark for dssp
BradyAJohnston Mar 20, 2026
e3eb6ec
Merge branch 'develop' into dssp-kdtree
BradyAJohnston Mar 24, 2026
f00c5ef
Update testsuite/MDAnalysisTests/analysis/test_dssp.py
orbeckst Mar 24, 2026
c1fc511
Merge branch 'develop' into dssp-kdtree
orbeckst Mar 31, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions benchmarks/benchmarks/analysis/dssp.py
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including the benchmark.

Idea for the future: test different backends via test parameters; I tried looking into it but was getting stymied by lack of distopia conda-forge packages for Python 3.14 MDAnalysis/distopia#190 — until we have those it won't run cleanly in our nightly benchmarking environment that looks at 3.11 and 3.14 https://github.com/MDAnalysis/benchmarks/blob/32322d3cc11167f3d5cb5a493db2bfcc2703de43/config/asv_c3potato.conf.json#L36

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I initially tried against different backends but ran into issues with distopia & I haven't had any experience with it so left it. Glad it wasn't just me!

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import MDAnalysis as mda

try:
from MDAnalysisTests.datafiles import TPR, XTC, PDB_janin
except:
pass

try:
from MDAnalysis.analysis.dssp import DSSP
from MDAnalysis.analysis.dssp.pydssp_numpy import assign
except:
pass


class DsspBench(object):
"""Benchmarks for MDAnalysis.analysis.dssp.DSSP over a trajectory."""

params = ([1, 5, 10], [True, False])
param_names = ["n_frames", "guess_hydrogens"]

def setup(self, n_frames, guess_hydrogens):
self.u = mda.Universe(TPR, XTC)
self.dssp = DSSP(self.u, guess_hydrogens=guess_hydrogens)
self.n_frames = n_frames

def time_dssp_run(self, n_frames, guess_hydrogens):
"""Benchmark DSSP.run() over n_frames of a trajectory."""
self.dssp.run(stop=self.n_frames)


class DsspAssignBench(object):
"""Benchmarks for the core DSSP assign() function on a single frame.

Uses PDB_janin as a large test system for number of residues
"""

params = [50, 100, 214, 500]
param_names = ["n_residues"]

def setup(self, n_residues):
self.u = mda.Universe(PDB_janin)
dssp = DSSP(self.u)
coords = dssp._get_coords()
self.coords = coords[:n_residues]

def time_assign(self, n_residues):
"""Benchmark single-frame assign() call."""
assign(self.coords)
12 changes: 6 additions & 6 deletions package/AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ Chronological list of authors
- Mingyi Xue
- Meghan Osato
- Anirvinya G
- Rishabh Shukla
- Rishabh Shukla
- Manish Kumar
- Aditi Tripathi
- Sukeerti T
Expand Down Expand Up @@ -247,22 +247,22 @@ Chronological list of authors
- Jia-Xin Zhu
- Tanish Yelgoe
2025
- Joshua Raphael Uy
- Joshua Raphael Uy
- Namir Oues
- Pavel Buslaev
- Pavel Buslaev
- Lexi Xu
- BHM-Bob G
- BHM-Bob G
- Yu-Yuan (Stuart) Yang
- James Rowe
- Debasish Mohanty
- Debasish Mohanty
- Abdulrahman Elbanna
- Tulga-Erdene Sodjargal
- Gareth Elliott
- Marc Schuh
- Sirsha Ganguly
- Amruthesh Thirumalaiswamy
- Ch Zhang
- Raúl Lois-Cuns
- Raúl Lois-Cuns
- Pranay Pelapkar
- Shreejan Dolai
- Tanisha Dubey
Expand Down
8 changes: 7 additions & 1 deletion package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ The rules for this file:
* 2.11.0

Fixes
* DSSP now correctly handles periodic boundary conditions in hydrogen bond
detection; previously, hydrogen bonds across periodic boundaries were
missed (PR #5182)
* Fix mixed-case atom types in guess_bonds (Issue #5342, PR #5343)
* Fixes msd for non-linear frames, when non_linear is not explicitly
provided (Issue #5100, PR #5254)
Expand All @@ -46,6 +49,9 @@ Fixes
DSSP by porting upstream PyDSSP 0.9.1 fix (Issue #4913)

Enhancements
* DSSP uses ``capped_distance`` and ``calc_bonds`` for faster distance
calculations and supports ``backend`` selection (e.g. ``"distopia"``)
(PR #5182)
* Added documentation for all keyword in select_atoms() and
selections.rst (Issue #5317, PR #5325)
* Added HydrogenBondAnalysis benchmark for performance tracking (PR #5309)
Expand All @@ -55,7 +61,7 @@ Enhancements
* Reduces duplication of code in _apply() function (Issue #5247, PR #5294)
* Added new top-level `MDAnalysis.fetch` module (PR #4943)
* Added new function `MDAnalysis.fetch.from_PDB` to download structure files from wwPDB
using `pooch` as optional dependency (Issue #4907, PR #4943)
using `pooch` as optional dependency (Issue #4907, PR #4943)
* Added benchmarks for package.MDAnalysis.analysis.msd.EinsteinMSD (PR #5277)
* Improved performance of `AtomGroup.wrap()` with compounds (PR #5220)
* Adds support for parsing `.tpr` files produced by GROMACS 2026.0
Expand Down
27 changes: 26 additions & 1 deletion package/MDAnalysis/analysis/dssp/dssp.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@
.. autofunction:: translate
"""

import warnings
from typing import Optional, Union

import numpy as np
Expand Down Expand Up @@ -236,6 +237,14 @@ class DSSP(AnalysisBase):
(e.g., a :exc:`ValueError` is raised) you must customize `hydrogen_name`
for your specific case.

backend : str, optional
:ref:`Backend for distance calculations<selection-of-acceleration-backend>`, by default ``"serial"``.
Can be set to ``"distopia"`` if `distopia`_ is installed.

.. versionadded:: 2.11.0

.. _distopia: https://www.mdanalysis.org/distopia/


Raises
------
Expand Down Expand Up @@ -308,8 +317,10 @@ def __init__(
*,
heavyatom_names: tuple[str] = ("N", "CA", "C", "O O1 OT1"),
hydrogen_name: str = "H HN HT1 HT2 HT3",
backend: str = "serial",
):
self._guess_hydrogens = guess_hydrogens
self._backend = backend

ag: AtomGroup = atoms.select_atoms("protein")
super().__init__(ag.universe.trajectory)
Expand All @@ -325,6 +336,15 @@ def __init__(
for t in heavyatom_names
}
self._donor_mask: Optional[np.ndarray] = ag.residues.resnames != "PRO"
self._box = self._trajectory.ts.dimensions
if self._box is not None and np.allclose(
self._box, [1, 1, 1, 90, 90, 90]
):
self._box = None
warnings.warn(
"Box dimensions are (1, 1, 1, 90, 90, 90), not using "
"periodic boundary conditions in DSSP calculations"
)
self._hydrogens: list["AtomGroup"] = [
res.atoms.select_atoms(f"name {hydrogen_name}")
for res in ag.residues
Expand Down Expand Up @@ -407,7 +427,12 @@ def _get_coords(self) -> np.ndarray:

def _single_frame(self):
coords = self._get_coords()
dssp = assign(coords, donor_mask=self._donor_mask)
dssp = assign(
coords,
donor_mask=self._donor_mask,
box=self._box,
backend=self._backend,
)
self.results.dssp_ndarray.append(dssp)

def _conclude(self):
Expand Down
124 changes: 88 additions & 36 deletions package/MDAnalysis/analysis/dssp/pydssp_numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,13 @@

import numpy as np

from MDAnalysis.lib.distances import calc_bonds, capped_distance

CONST_Q1Q2 = 0.084
CONST_F = 332
DEFAULT_CUTOFF = -0.5
DEFAULT_MARGIN = 1.0
HBOND_SEARCH_CUTOFF = 5.0


def _upsample(a: np.ndarray, window: int) -> np.ndarray:
Expand Down Expand Up @@ -118,10 +121,12 @@ def _get_hydrogen_atom_position(coord: np.ndarray) -> np.ndarray:

def get_hbond_map(
coord: np.ndarray,
donor_mask: np.ndarray = None,
donor_mask: np.ndarray | None = None,
cutoff: float = DEFAULT_CUTOFF,
margin: float = DEFAULT_MARGIN,
return_e: bool = False,
box: np.ndarray | None = None,
backend: str = "serial",
) -> np.ndarray:
"""Returns hydrogen bond map

Expand All @@ -145,6 +150,20 @@ def get_hbond_map(
return_e : bool, optional
if to return energy instead of hbond map, by default False

box : array_like, optional
The unitcell dimensions of the system, which can be orthogonal or
triclinic and must be provided in the same format as returned by
:attr:`MDAnalysis.coordinates.timestep.Timestep.dimensions`:
``[lx, ly, lz, alpha, beta, gamma]``.

.. versionadded:: 2.11.0

backend : str, optional
Backend for distance calculations, by default ``"serial"``.
Can be set to ``"distopia"`` if :mod:`distopia` is installed.

.. versionadded:: 2.11.0

Returns
-------
np.ndarray
Expand All @@ -156,6 +175,9 @@ def get_hbond_map(
.. versionchanged:: 2.10.0
Support masking of hydrogen donors via `donor_mask` (especially needed
for ignoring HN on proline residues). Backport of PRO fix from pydssp 0.9.1.
.. versionchanged:: 2.11.0
Speedup with ``capped_distance`` and support periodic boundary checking
via `box` param.
"""
n_residues, n_atom_types, _xyz = coord.shape
assert n_atom_types in (
Expand All @@ -176,58 +198,69 @@ def get_hbond_map(
# h.shape == (n_residues, 3)
# coord.shape == (n_residues, 4, 3)

# distance matrix
n_1, c_0, o_0 = coord[1:, 0], coord[0:-1, 2], coord[0:-1, 3]

n = n_residues - 1
cmap = np.tile(c_0, (n, 1, 1))
omap = np.tile(o_0, (n, 1, 1))
nmap = np.tile(n_1, (1, 1, n)).reshape(n, n, 3)
hmap = np.tile(h_1, (1, 1, n)).reshape(n, n, 3)
n_atoms, c_atoms, o_atoms = coord[1:, 0], coord[:-1, 2], coord[:-1, 3]

# We can reduce the need for a complete pairwise distance matrix by first searching for
# candidate pairs within a certain distance cutoff and only computing the energy
# for these relevant pairs rather than "potential" HBonds between far apart residues
pairs, d_on = capped_distance(
n_atoms,
o_atoms,
max_cutoff=HBOND_SEARCH_CUTOFF,
box=box,
backend=backend,
)

d_on = np.linalg.norm(omap - nmap, axis=-1)
d_ch = np.linalg.norm(cmap - hmap, axis=-1)
d_oh = np.linalg.norm(omap - hmap, axis=-1)
d_cn = np.linalg.norm(cmap - nmap, axis=-1)
# Exclude local pairs (i, i), (i, i+1), (i, i+2) that are too close for SS HBonds
local_mask = abs(pairs[:, 0] - pairs[:, 1]) >= 2
pairs = pairs[local_mask]
d_on = d_on[local_mask]

# Exclude donor H absence (Proline)
if donor_mask is not None:
donor_indices = np.where(np.array(donor_mask) == 0)[0]
mask = ~np.isin(pairs[:, 0], donor_indices - 1)
pairs = pairs[mask]
d_on = d_on[mask]

# compute distances and energy as previously but only for the potential pairs
# still returning the same energy matrix that would have otherwise been made
o_indices = pairs[:, 1]
n_indices = pairs[:, 0]

d_ch = calc_bonds(
c_atoms[o_indices], h_1[n_indices], box=box, backend=backend
)
d_oh = calc_bonds(
o_atoms[o_indices], h_1[n_indices], box=box, backend=backend
)
d_cn = calc_bonds(
c_atoms[o_indices], n_atoms[n_indices], box=box, backend=backend
)

# electrostatic interaction energy
# e[i, j] = e(CO_i) - e(NH_j)
e = np.pad(
e = np.zeros((n_residues, n_residues))
e[n_indices + 1, o_indices] = (
CONST_Q1Q2
* (1.0 / d_on + 1.0 / d_ch - 1.0 / d_oh - 1.0 / d_cn)
* CONST_F,
[[1, 0], [0, 1]],
* CONST_F
)

if return_e: # pragma: no cover
return e

# mask for local pairs (i,i), (i,i+1), (i,i+2)
local_mask = ~np.eye(n_residues, dtype=bool)
local_mask *= ~np.diag(np.ones(n_residues - 1, dtype=bool), k=-1)
local_mask *= ~np.diag(np.ones(n_residues - 2, dtype=bool), k=-2)
# mask for donor H absence (Proline)
donor_mask = (
np.array(donor_mask).astype(float)
if donor_mask is not None
else np.ones(n_residues, dtype=float)
)
donor_mask = np.tile(donor_mask[:, np.newaxis], (1, n_residues))
# hydrogen bond map (continuous value extension of original definition)
hbond_map = np.clip(cutoff - margin - e, a_min=-margin, a_max=margin)
hbond_map = (np.sin(hbond_map / margin * np.pi / 2) + 1.0) / 2

assert hbond_map.shape == local_mask.shape == donor_mask.shape

hbond_map *= local_mask
hbond_map *= donor_mask

return hbond_map


def assign(
coord: np.ndarray,
donor_mask: np.ndarray = None,
donor_mask: np.ndarray | None = None,
box: np.ndarray | None = None,
backend: str = "serial",
) -> np.ndarray:
"""Assigns secondary structure for a given coordinate array,
either with or without assigned hydrogens
Expand All @@ -240,12 +273,26 @@ def assign(
(N, CA, C, O) atoms coordinates (if k=4), or (N, CA, C, O, H) coordinates
(when k=5).

donor_mask : np.array
donor_mask : np.array | None, optional
Mask out any hydrogens that should not be considered (in particular HN
in PRO). If ``None`` then all H will be used (behavior up to 2.9.0).

.. versionadded:: 2.10.0

box : array_like, optional
The unitcell dimensions of the system, which can be orthogonal or
triclinic and must be provided in the same format as returned by
:attr:`MDAnalysis.coordinates.timestep.Timestep.dimensions`:
``[lx, ly, lz, alpha, beta, gamma]``.

.. versionadded:: 2.11.0

backend : str, optional
Backend for distance calculations, by default ``"serial"``.
Can be set to ``"distopia"`` if `distopia`_ is installed.

.. versionadded:: 2.11.0

Returns
-------
np.ndarray
Expand All @@ -257,10 +304,15 @@ def assign(

.. versionchanged:: 2.10.0
Support masking of donors.
.. versionchanged:: 2.11.0
Speedup with ``capped_distance`` and support periodic boundary checking
via `box` param.

"""
# get hydrogen bond map
hbmap = get_hbond_map(coord, donor_mask=donor_mask)
hbmap = get_hbond_map(
coord, donor_mask=donor_mask, box=box, backend=backend
)
hbmap = np.swapaxes(hbmap, -1, -2) # convert into "i:C=O, j:N-H" form

# identify turn 3, 4, 5
Expand Down
Loading
Loading