Skip to content
This repository was archived by the owner on Aug 29, 2022. It is now read-only.
This repository was archived by the owner on Aug 29, 2022. It is now read-only.

Specifying the composition of the system to be simulated #10

@jchodera

Description

@jchodera

We need to decide how the information about what chemical matter the system to be simulated will be specified. Given this information, we can establish software tools, pipelines, and best practices that try to "do the right thing" by making use of available structural data to determine good initial positions, protonation states, etc. Deciding on how this information will be represented---either output from a helper utility or specified by a user---is the first step.

We also think that it will be useful to provide a means for expert users to provide "hints" or extra information that will help the pipeline in making choices (or constrain the choices it can make in order to achieve desired outcomes), but that is a separate question.

Essential information

The essential information we need to capture is:

  • What biopolymers are present in the system? How many copies of each?
  • What small molecules are present in the system, and how many (or at what concentration)?
  • What additional salts/cofactors/buffers are present?
  • What are the relevant thermodynamic parameters? (e.g. temperature, pressure, pH, redox potential)

Biopolymers

For biopolymers, there are a multitude of ways to specify the system, but fundamentally we need to capture the following information:

  • It's critical we know exactly what construct (sequence of amino acids or nucleic acids) is used
  • Any post-translational modifications must be known
  • If there are non-natural or synthetic residues, we need some way of specifying these
  • We need a way of specifying more than one biomolecule is present in the system

Some thoughts on specifying this information:

  • One-letter codes are convenient but restricted to the 20 naturally occurring amino acids
  • Three-letter codes in principle allow access to all of the residue components in the PDB, made available via the ligand expo, but may present challenges in describing branched topologies or chemically modified amino acids where the chain is represented by an amino acid that is connected to two natural amino acids and a HETATM modified residue via the sidechain. We also can't necessarily encode protonation states this way, and may need some other manner to describe protomer/tautomer variants.
  • A Topology like object that has atomic elements, connectivity, and bond orders or formal charges would be more flexible, but harder to produce.

Format

Something that could be converted to a Python dict may be useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions