Skip to content

Consolidating Ouput#853

Merged
alongd merged 15 commits intomainfrom
consolidate_output
Apr 5, 2026
Merged

Consolidating Ouput#853
alongd merged 15 commits intomainfrom
consolidate_output

Conversation

@calvinp0
Copy link
Copy Markdown
Member

@calvinp0 calvinp0 commented Apr 1, 2026

ARC has had a very complex output and requires knowledge of how to navigate the folder structure and files in order to find the data you need.

This consolidation is an attempt to bring all that information to one place in an easy to read format. Additionally it is paves the path for ease of use with TCKDB

@calvinp0 calvinp0 requested review from alongd, Copilot and kfir4444 April 1, 2026 14:02
Comment thread arc/output.py Fixed
Comment thread arc/output.py Fixed
Comment thread arc/output.py Fixed
Comment thread arc/output.py Fixed
Comment thread arc/output.py Fixed
Comment thread arc/output_test.py Fixed
Comment thread arc/scripts/get_point_groups.py Fixed
Comment thread arc/scripts/get_point_groups.py Fixed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a consolidated output.yml artifact to centralize ARC run results (paths, energies, thermo/statmech, metadata) into a single downstream-friendly file, and extends parsers/helpers to support that output.

Changes:

  • Added arc/output.py to atomically write output/output.yml with per-species/TS/reaction summaries and run metadata.
  • Extended Arkane parsing and species/scheduler data plumbing (e.g., conformer statmech parsing, TS NEB log paths, storing harmonic frequencies).
  • Added new helper scripts/tests and parser capabilities (ESS version parsing, opt-step parsing, point group batch computation).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
arc/output.py New consolidated output.yml writer and supporting extraction helpers (ESS versions, point groups, corrections, etc.).
arc/output_test.py Extensive unit/integration tests for consolidated output writer and helper behaviors.
arc/main.py Calls write_output_yml() at end of run; wires NEB level + scale-factor provenance logic.
arc/scheduler.py Captures NEB log paths, preserves coarse-opt log path, stores parsed freqs, adjusts convergence logic for E0 failures.
arc/species/species.py Adds freqs storage, persists TSGuess log_path, extends ThermoData with NASA/Cp tabulation fields + update helper.
arc/statmech/arkane.py Improves Arkane input logic for AEC/BAC enablement, parses conformer statmech and kinetics uncertainty metadata, parses scalar thermo values as floats.
arc/statmech/arkane_test.py Adds unit tests for new Arkane parsing routines (parse_e0, conformer statmech, kinetics comment uncertainties, scalar parsing).
arc/scripts/save_arkane_thermo.py Enhances thermo export to include NASA polynomial coefficients + tabulated Cp data.
arc/scripts/get_point_groups.py New helper script to batch-compute point groups via Patchkovskii symmetry binary.
arc/scripts_test.py Subprocess tests for helper scripts in rmg_env (skipped when env unavailable).
arc/parser/parser.py Adds top-level parse_opt_steps and parse_ess_version parser functions.
arc/parser/parser_test.py Adds tests for opt-step counting and ESS version parsing, including ORCA 6 fixture.
arc/parser/adapter.py Adds adapter API stubs for parse_opt_steps and parse_ess_version.
arc/parser/adapters/gaussian.py Implements Gaussian opt-step counting + Gaussian version parsing.
arc/parser/adapters/orca.py Adds ORCA version parsing.
arc/parser/adapters/qchem.py Adds Q-Chem version parsing.
arc/parser/adapters/molpro.py Adds Molpro version parsing.
arc/job/adapters/ts/orca_neb.py Stores NEB output log path on TSGuess for reporting.
arc/testing/freq/orca6_example.out Adds ORCA 6 output fixture for version parsing tests.
arc/testing/statmech/thermo/RMG_libraries/thermo.py Adds a test RMG thermo library fixture for script/tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread arc/output.py Outdated
Comment thread arc/output.py Outdated
Comment thread arc/parser/adapters/gaussian.py
Comment thread arc/main.py Outdated
Comment thread arc/species/species.py Outdated
@alongd
Copy link
Copy Markdown
Member

alongd commented Apr 1, 2026

Thanks!!
can you add a schematic output tree overview?

@calvinp0
Copy link
Copy Markdown
Member Author

calvinp0 commented Apr 1, 2026

Thanks!! can you add a schematic output tree overview?

Sure, added one now:
0d73cbb

@calvinp0 calvinp0 force-pushed the consolidate_output branch from 0d73cbb to 367885e Compare April 1, 2026 16:55
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.24%. Comparing base (881a614) to head (1e374ec).
⚠️ Report is 16 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #853      +/-   ##
==========================================
+ Coverage   58.93%   59.24%   +0.30%     
==========================================
  Files          97       98       +1     
  Lines       29502    30023     +521     
  Branches     7831     7929      +98     
==========================================
+ Hits        17387    17786     +399     
- Misses       9894     9985      +91     
- Partials     2221     2252      +31     
Flag Coverage Δ
functionaltests 59.24% <ø> (+0.30%) ⬆️
unittests 59.24% <ø> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @calvinp0 for creating this single output file consolidating all ARC run results!
The implementation looks good. I left some comments.

Comment thread arc/main.py
Comment thread arc/output.py
Comment thread arc/output.py Outdated
Comment thread arc/output.py Outdated
Comment thread arc/scheduler.py
raise SchedulerError('Called check_freq_job with no output file')
vibfreqs = parser.parse_frequencies(log_file_path=str(job.local_path_to_output_file))
freq_ok = self.check_negative_freq(label=label, job=job, vibfreqs=vibfreqs)
if freq_ok and vibfreqs is not None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have freqs defined in ARCSpecies as_dict/from_dict. Looks like we need it now. Would you agree?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now threading through in ARCSpecies

Comment thread arc/output.py Outdated
yml_path = os.path.join(ARC_PATH, 'data', 'freq_scale_factors.yml')
try:
with open(yml_path, 'r', encoding='utf-8') as f:
raw = f.read()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not read it as a YAML file using ARC's functionalities?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause of how the freq_scale_factors.yml is structured, so YAML read in ARC's functionalities would discard the citations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should include the citation as part of the YAML? Add another key?

Comment thread arc/output.py
Comment thread arc/output.py
Comment thread arc/output.py Outdated
Comment thread arc/output.py
@calvinp0 calvinp0 force-pushed the consolidate_output branch from 367885e to 725104a Compare April 3, 2026 13:53
@calvinp0 calvinp0 force-pushed the consolidate_output branch 2 times, most recently from 4d96ca1 to 1e374ec Compare April 3, 2026 21:14
calvinp0 added 11 commits April 4, 2026 00:14
Introduces a new module to generate a consolidated `output.yml` file at the end of an ARC run. This file gathers all result data—including species properties, reaction kinetics, thermochemistry, and levels of theory—into a single source of truth with run-relative paths.

This format supersedes the existing `status.yml` and project info files, simplifying data consumption for downstream tools like TCKDB and custom analysis scripts. The file is written atomically to ensure data integrity even if a run is interrupted.

Removed bare empty and unused import

Enhance output.yml generation with additional parameters and improved error handling

Refactor frequency scale factor source resolution and update test patches for output module
Invokes the consolidated output writer at the end of the ARC execution loop to generate a single source of truth for project results.

The integration includes logic to determine whether frequency scaling factors were user-supplied or automatically looked up and extracts level of theory details for NEB calculations if performed. It also refines the Arkane level of theory check to ensure warnings are issued correctly when a level cannot be determined.

Ensure older versions of ARC still work

Refactor NEB level handling and streamline output parameters in ARC class
Introduces `parse_ess_version` and `parse_opt_steps` methods to the `ESSAdapter` base class, with specific implementations for Gaussian, Molpro, Orca, and Q-Chem. This allows ARC to extract software provenance and optimization history from log files for inclusion in project results.

Fall back parser for gaussian
Improves the extraction of species and reaction data from Arkane output files to ensure comprehensive information is available for project results. This includes parsing external symmetry, optical isomers, kinetics uncertainties (dA, dn, dEa), and additional thermochemistry parameters like NASA polynomials.

The update also introduces a check to disable and warn about atom and bond energy corrections when the level of theory is not recognized by Arkane or ARC, preventing the application of incorrect corrections to thermo and kinetics results.
Updates the `save_arkane_thermo.py` script to extract NASA polynomial coefficients and tabulated heat capacity data at predefined temperatures. This ensures that the generated YAML file contains a more comprehensive representation of the species' thermochemical properties beyond standard enthalpy and entropy.
Introduces a utility script to determine point groups for multiple species using Patchkovskii's symmetry program. The script processes species geometries from a YAML input and writes the identified point groups to a YAML output, supporting the consolidation of species metadata.

Removed bare empty excepts
…lidation

Updates the ARCSpecies, TSGuess, and ThermoData classes to capture additional metadata, including vibrational frequencies, ESS log paths for transition state guesses, and detailed thermochemical parameters like NASA polynomials and tabulated heat capacity. This supports the generation of a comprehensive consolidated output file.

Fixed return order

Add frequency data handling to ARCSpecies serialization
Ensure that older ARC versions still work
Updates the Scheduler class to capture additional metadata, including coarse optimization and NEB log paths, as well as vibrational frequencies. The update also refines convergence logic for transition states to ensure they are correctly marked as unconverged if they fail E0 checks and no valid alternatives are found, supporting more comprehensive results consolidation.

Adjusted scheduler test

Updated the schedluer test expectation for the geo coarse path
@calvinp0 calvinp0 requested a review from alongd April 5, 2026 07:35
Copy link
Copy Markdown
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@alongd alongd merged commit e8ce91a into main Apr 5, 2026
8 checks passed
@alongd alongd deleted the consolidate_output branch April 5, 2026 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants