Specifying the composition of the system to be simulated

We need to decide how the information about what chemical matter the system to be simulated will be specified. Given this information, we can establish software tools, pipelines, and best practices that try to "do the right thing" by making use of available structural data to determine good initial positions, protonation states, etc. Deciding on how this information will be represented---either output from a helper utility or specified by a user---is the first step. 

We also think that it will be useful to provide a means for expert users to provide "hints" or extra information that will help the pipeline in making choices (or constrain the choices it can make in order to achieve desired outcomes), but that is a separate question.

## Essential information

The essential information we need to capture is:
* What biopolymers are present in the system? How many copies of each?
* What small molecules are present in the system, and how many (or at what concentration)?
* What additional salts/cofactors/buffers are present?
* What are the relevant thermodynamic parameters? (e.g. temperature, pressure, pH, redox potential)

### Biopolymers

For biopolymers, there are a multitude of ways to specify the system, but fundamentally we need to capture the following information:
* It's critical we know exactly what *construct* (sequence of amino acids or nucleic acids) is used
* Any post-translational modifications must be known
* If there are non-natural or synthetic residues, we need some way of specifying these
* We need a way of specifying more than one biomolecule is present in the system

Some thoughts on specifying this information:
* One-letter codes are convenient but restricted to the 20 naturally occurring amino acids
* Three-letter codes in principle allow access to all of the residue components in the PDB, made available via the [ligand expo](http://ligand-expo.rcsb.org), but may present challenges in describing branched topologies or chemically modified amino acids where the chain is represented by an amino acid that is connected to two natural amino acids and a HETATM modified residue via the sidechain. We also can't necessarily encode protonation states this way, and may need some other manner to describe protomer/tautomer variants.
* A `Topology` like object that has atomic elements, connectivity, and bond orders or formal charges would be more flexible, but harder to produce.

## Format

Something that could be converted to a Python `dict` may be useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying the composition of the system to be simulated #10

Essential information

Biopolymers

Format

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Specifying the composition of the system to be simulated #10

Description

Essential information

Biopolymers

Format

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions