Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a consolidated output.yml artifact to centralize ARC run results (paths, energies, thermo/statmech, metadata) into a single downstream-friendly file, and extends parsers/helpers to support that output.
Changes:
- Added
arc/output.pyto atomically writeoutput/output.ymlwith per-species/TS/reaction summaries and run metadata. - Extended Arkane parsing and species/scheduler data plumbing (e.g., conformer statmech parsing, TS NEB log paths, storing harmonic frequencies).
- Added new helper scripts/tests and parser capabilities (ESS version parsing, opt-step parsing, point group batch computation).
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| arc/output.py | New consolidated output.yml writer and supporting extraction helpers (ESS versions, point groups, corrections, etc.). |
| arc/output_test.py | Extensive unit/integration tests for consolidated output writer and helper behaviors. |
| arc/main.py | Calls write_output_yml() at end of run; wires NEB level + scale-factor provenance logic. |
| arc/scheduler.py | Captures NEB log paths, preserves coarse-opt log path, stores parsed freqs, adjusts convergence logic for E0 failures. |
| arc/species/species.py | Adds freqs storage, persists TSGuess log_path, extends ThermoData with NASA/Cp tabulation fields + update helper. |
| arc/statmech/arkane.py | Improves Arkane input logic for AEC/BAC enablement, parses conformer statmech and kinetics uncertainty metadata, parses scalar thermo values as floats. |
| arc/statmech/arkane_test.py | Adds unit tests for new Arkane parsing routines (parse_e0, conformer statmech, kinetics comment uncertainties, scalar parsing). |
| arc/scripts/save_arkane_thermo.py | Enhances thermo export to include NASA polynomial coefficients + tabulated Cp data. |
| arc/scripts/get_point_groups.py | New helper script to batch-compute point groups via Patchkovskii symmetry binary. |
| arc/scripts_test.py | Subprocess tests for helper scripts in rmg_env (skipped when env unavailable). |
| arc/parser/parser.py | Adds top-level parse_opt_steps and parse_ess_version parser functions. |
| arc/parser/parser_test.py | Adds tests for opt-step counting and ESS version parsing, including ORCA 6 fixture. |
| arc/parser/adapter.py | Adds adapter API stubs for parse_opt_steps and parse_ess_version. |
| arc/parser/adapters/gaussian.py | Implements Gaussian opt-step counting + Gaussian version parsing. |
| arc/parser/adapters/orca.py | Adds ORCA version parsing. |
| arc/parser/adapters/qchem.py | Adds Q-Chem version parsing. |
| arc/parser/adapters/molpro.py | Adds Molpro version parsing. |
| arc/job/adapters/ts/orca_neb.py | Stores NEB output log path on TSGuess for reporting. |
| arc/testing/freq/orca6_example.out | Adds ORCA 6 output fixture for version parsing tests. |
| arc/testing/statmech/thermo/RMG_libraries/thermo.py | Adds a test RMG thermo library fixture for script/tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks!! |
95f3f7a to
0d73cbb
Compare
Sure, added one now: |
0d73cbb to
367885e
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #853 +/- ##
==========================================
+ Coverage 58.93% 59.24% +0.30%
==========================================
Files 97 98 +1
Lines 29502 30023 +521
Branches 7831 7929 +98
==========================================
+ Hits 17387 17786 +399
- Misses 9894 9985 +91
- Partials 2221 2252 +31
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| raise SchedulerError('Called check_freq_job with no output file') | ||
| vibfreqs = parser.parse_frequencies(log_file_path=str(job.local_path_to_output_file)) | ||
| freq_ok = self.check_negative_freq(label=label, job=job, vibfreqs=vibfreqs) | ||
| if freq_ok and vibfreqs is not None: |
There was a problem hiding this comment.
I don't think we have freqs defined in ARCSpecies as_dict/from_dict. Looks like we need it now. Would you agree?
There was a problem hiding this comment.
I am now threading through in ARCSpecies
| yml_path = os.path.join(ARC_PATH, 'data', 'freq_scale_factors.yml') | ||
| try: | ||
| with open(yml_path, 'r', encoding='utf-8') as f: | ||
| raw = f.read() |
There was a problem hiding this comment.
why not read it as a YAML file using ARC's functionalities?
There was a problem hiding this comment.
Cause of how the freq_scale_factors.yml is structured, so YAML read in ARC's functionalities would discard the citations.
There was a problem hiding this comment.
Maybe we should include the citation as part of the YAML? Add another key?
367885e to
725104a
Compare
4d96ca1 to
1e374ec
Compare
Introduces a new module to generate a consolidated `output.yml` file at the end of an ARC run. This file gathers all result data—including species properties, reaction kinetics, thermochemistry, and levels of theory—into a single source of truth with run-relative paths. This format supersedes the existing `status.yml` and project info files, simplifying data consumption for downstream tools like TCKDB and custom analysis scripts. The file is written atomically to ensure data integrity even if a run is interrupted. Removed bare empty and unused import Enhance output.yml generation with additional parameters and improved error handling Refactor frequency scale factor source resolution and update test patches for output module
Invokes the consolidated output writer at the end of the ARC execution loop to generate a single source of truth for project results. The integration includes logic to determine whether frequency scaling factors were user-supplied or automatically looked up and extracts level of theory details for NEB calculations if performed. It also refines the Arkane level of theory check to ensure warnings are issued correctly when a level cannot be determined. Ensure older versions of ARC still work Refactor NEB level handling and streamline output parameters in ARC class
Introduces `parse_ess_version` and `parse_opt_steps` methods to the `ESSAdapter` base class, with specific implementations for Gaussian, Molpro, Orca, and Q-Chem. This allows ARC to extract software provenance and optimization history from log files for inclusion in project results. Fall back parser for gaussian
Improves the extraction of species and reaction data from Arkane output files to ensure comprehensive information is available for project results. This includes parsing external symmetry, optical isomers, kinetics uncertainties (dA, dn, dEa), and additional thermochemistry parameters like NASA polynomials. The update also introduces a check to disable and warn about atom and bond energy corrections when the level of theory is not recognized by Arkane or ARC, preventing the application of incorrect corrections to thermo and kinetics results.
Updates the `save_arkane_thermo.py` script to extract NASA polynomial coefficients and tabulated heat capacity data at predefined temperatures. This ensures that the generated YAML file contains a more comprehensive representation of the species' thermochemical properties beyond standard enthalpy and entropy.
Introduces a utility script to determine point groups for multiple species using Patchkovskii's symmetry program. The script processes species geometries from a YAML input and writes the identified point groups to a YAML output, supporting the consolidation of species metadata. Removed bare empty excepts
…lidation Updates the ARCSpecies, TSGuess, and ThermoData classes to capture additional metadata, including vibrational frequencies, ESS log paths for transition state guesses, and detailed thermochemical parameters like NASA polynomials and tabulated heat capacity. This supports the generation of a comprehensive consolidated output file. Fixed return order Add frequency data handling to ARCSpecies serialization
Ensure that older ARC versions still work
Updates the Scheduler class to capture additional metadata, including coarse optimization and NEB log paths, as well as vibrational frequencies. The update also refines convergence logic for transition states to ensure they are correctly marked as unconverged if they fail E0 checks and no valid alternatives are found, supporting more comprehensive results consolidation. Adjusted scheduler test Updated the schedluer test expectation for the geo coarse path
ARC has had a very complex output and requires knowledge of how to navigate the folder structure and files in order to find the data you need.
This consolidation is an attempt to bring all that information to one place in an easy to read format. Additionally it is paves the path for ease of use with TCKDB