Skip to content

feat(native): add rebuild-on-change for NativeModule#1599

Closed
jeff-hykin wants to merge 93 commits intodevfrom
jeff/feat/native_rebuild
Closed

feat(native): add rebuild-on-change for NativeModule#1599
jeff-hykin wants to merge 93 commits intodevfrom
jeff/feat/native_rebuild

Conversation

@jeff-hykin
Copy link
Copy Markdown
Member

@jeff-hykin jeff-hykin commented Mar 17, 2026

Problem

Editing source code doesnt cause native modules to rebuild. Very easy to forget and not friendly for ai edits.

Solution

Some generic utils:

  • dimos/utils/change_detect.py: Content-hash-based file change detection using xxhash.
  • NativeModuleConfig.rebuild_on_change: Optional list[str|Path|Glob]

Breaking Changes

None

How to Test

# Change detection tests
pytest dimos/utils/test_change_detect.py -v

# Native module rebuild tests
pytest dimos/core/test_native_rebuild.py -v

# Native module crash/thread leak test
pytest dimos/core/test_native_module.py::test_process_crash_triggers_stop -v

# LCM isolation tests
pytest dimos/protocol/pubsub/test_pattern_sub.py -v
pytest dimos/protocol/pubsub/impl/test_lcmpubsub.py -v

# Full fast suite
pytest -m 'not (tool or slow or mujoco)' dimos/

Contributor License Agreement

  • I have read and approved the CLA.

spomichter and others added 17 commits January 23, 2026 07:34
… Unitree Go2 Navigation & Exploration Beta

Pre-Release v0.0.8: Unitree Go2 Navigation & Exploration Beta, Transport Updates, Documentation updates, Rerun fixes, Person follow, Readme updates

## What's Changed
* Small docs clarification about stream getters by @leshy in #1043
* Fix split view on wide monitors by @jeff-hykin in #1048
* Docs: Install & Develop  by @jeff-hykin in #1022
* Add uv to nix and fix resulting problems by @jeff-hykin in #1021
* v0.0.8 by @paul-nechifor in #1050
* Style changes in docs by @paul-nechifor in #1051
* Revert "Add uv to nix and fix resulting problems" by @leshy in #1053
* Transport benchmarks + Raw ros transport by @leshy in #1038
* feat: default to rerun-web and auto-open browser on startup (browser … by @Nabla7 in #1019
* bbox detections visual check by @leshy in #1017
* fix: only auto-open browser for rerun-web viewer backend by @Nabla7 in #1066
* move slow tests to integration by @paul-nechifor in #1063
* Streamline transport start/stop methods by @Kaweees in #1062
* Person follow skill with EdgeTAM by @paul-nechifor in #1042
* fix: increase costmap floor z_offset to avoid z-fighting by @Nabla7 in #1073
* Fixed issue #1074 by @alexlin2 in #1075
* ROS transports initial by @leshy in #1057
* Fix System Config Values for LCM on MacOS and Refactor by @jeff-hykin in #1065
* SHM Transport basic fixes by @leshy in #1041
* commented out Mem Transport test case by @leshy in #1077
* Docs/advanced streams update 2 by @leshy in #1078
* Fix more tests by @paul-nechifor in #1071
* feat: navigation docker updates from bona_local_dev by @baishibona in #1081
* Fix missing dependencies by @Kaweees in #1085
* Release readme fixes by @spomichter in #1076

## New Contributors
* @baishibona made their first contribution in #1081

**Full Changelog**: v0.0.7...v0.0.8
…HTTPS from SSH, get_data change, LFS changes

v0.0.9 Release Patch: Git clone change to HTTPS from SSH, get_data change, LFS changes
Release v0.0.10: Manipulation Stack, MuJoCo Simulation, DDS Transport, Web and Native Visualization via Rerun


## Highlights

88+ commits, 20 contributors, 700+ files changed.

The TLDR: **a complete manipulation stack**, **MuJoCo simulation**, **DDS transport**, and **a rewritten visualization pipeline**. Agents are no longer bolted on top — they're refactored as native modules with direct stream access. The entire ROS message dependency has been removed from core DimOS, and we've added VR, phone, and arm teleoperation stacks. You can now vibecode a pick-and-place task from natural language to motor commands. Installation has been significantly streamlined — no more direnv, simpler setup, and the web viewer is now the default.

---

## 🚀 New Features

### Simulation
- **MuJoCo simulation module** — Run any DimOS blueprint in simulation with no hardware. Supports xArm and Unitree embodiments, parses MJCF/URDF for robot properties, monotonic clock timing (no `time.sleep`). `dimos --simulation run unitree-go2` ([#1035](#1035)) by @jca0
- **Simulation teleop blueprints** — Added simulation teleop blueprints for Piper, xArm6, and xArm7. ([#1308](#1308)) by @mustafab0

### Manipulation
- **Modular manipulation stack** — Full planning stack with Drake: FK/IK solvers (Jacobian + Drake optimization), RRT path planning, world model with obstacle monitoring, multi-robot management. xArm6/7 and Piper support. ([#1079](#1079)) by @mustafab0
- **Joint servo and cartesian controllers** — Joint position/velocity controllers and cartesian IK task with Pinocchio solver. PoseStamped stream input for real-time control. ([#1116](#1116)) by @mustafab0
- **GraspGen integration** — Grasp generation via Docker-hosted GPU model. Lazy container startup, thread-safe init, RPC `generate_grasps()` returns ranked PoseArray. ([#1119](#1119), [#1234](#1234)) by @JalajShuklaSS
- **Gripper control** — Gripper RPC methods on control coordinator, exposed adapter property for custom implementations. ([#1213](#1213)) by @mustafab0
- **Detection3D and Object support** — Object input topics, TF support on manipulation module, pointcloud-to-convex-hull for Drake imports. ([#1236](#1236)) by @mustafab0
- **Agentic pick and place** — Reimplemented manipulation skills for agent-driven pick-and-place workflows. ([#1237](#1237)) by @mustafab0

### Teleoperation
- **Quest VR teleoperation** — Full WebXR + Deno bridge stack. Quest controller data (pose, trigger, grip) streamed to DimOS modules. Monitor-style locking for control loops. ([#1215](#1215)) by @ruthwikdasyam
- **Phone teleoperation** — Control Go2 from your phone with a web-based teleop interface. ([#1280](#1280)) by @ruthwikdasyam
- **Arm teleop with Pinocchio IK** — Single and dual arm teleoperation using Pinocchio inverse kinematics. Blueprints for xArm, Piper, and dual configurations. ([#1246](#1246)) by @ruthwikdasyam

### Transports & Infrastructure
- **DDS transport protocol** — CycloneDDS transport with configurable QoS (high-throughput and reliable profiles). Optional install, benchmark integration. ([#1174](#1174)) by @Kaweees
- **Pubsub pattern subscriptions** — Glob and regex pattern matching for topic subscriptions. `subscribe_all()` for bridge-style consumers. Topic type encoding in channel strings (`/topic#module.ClassName`). ([#1114](#1114)) by @leshy
- **LCM raw bytes passthrough** — Skip `lcm_encode()` when message is already bytes. ([#1223](#1223)) by @leshy
- **Unified TimeSeriesStore** — Pluggable backends (InMemory, SQLite, Pickle, PostgreSQL) with SortedKeyList for O(log n) operations. Replaces the old replay system and TimestampedCollection. Collection API with slice, range, and streaming methods. ([#1080](#1080)) by @leshy
- **DimosROS benchmark tests** — Benchmark suite for ROS transport performance. ([#1087](#1087)) by @leshy

### Navigation
- **FASTLIO2 support** — Hardware-verified localization with arm64 support. Docker deployment with FAR Planner, terrain analysis, and bagfile playback mode. Builds or-tools from source on arm64. ([#1149](#1149)) by @baishibona
- **Native Livox + FASTLIO2 module** — First-class DimOS native module for Livox Mid-360 lidar with FASTLIO2 localization. ([#1235](#1235)) by @leshy

### Visualization
- **RerunBridge module and CLI** — New bridge that subscribes to all LCM messages and logs those with `to_rerun()` to Rerun viewer. GlobalConfig singleton, web viewer support. Replaces the old rerun initialization system. ([#1154](#1154)) by @leshy
- **Webcam rerun visualization** — Camera module logs to Rerun with pinhole projection for 3D visualization. ([#1117](#1117)) by @ruthwikdasyam
- **Default viewer switched to rerun-web** — Browser-based viewer is now the default for broader compatibility. No native viewer install needed. ([#1324](#1324)) by @spomichter

### Agents
- **Agent refactor** — Restructured agent module with cleaner imports and global config integration. ([#1211](#1211)) by @paul-nechifor
- **Timestamp knowledge** — Agents now have timestamp awareness in prompts for temporal reasoning. ([#1093](#1093)) by @ClaireBookworm
- **Observe skill** — Go2 can now observe (capture and describe) its environment via agent skill. ([#1109](#1109)) by @paul-nechifor

### Platform & Hardware
- **G1 without ROS** — Unitree G1 blueprints decoupled from ROS dependency. Lazy imports for fast startup. ([#1221](#1221)) by @jeff-hykin
- **ARM (aarch64) support** — DimOS runs on ARM hardware. Platform-conditional dependencies, open3d source builds for arm64. ([#1229](#1229)) by @jeff-hykin
- **Universal joint/hardware schema** — `HardwareComponent` dataclass with `JointState`, `JointName` type aliases. Backend registry with auto-discovery for SDK adapters. ([#1040](#1040), [#1067](#1067)) by @mustafab0

---

## 🔧 Improvements

- **Optional Dask** — Start without Dask using `--no-dask` flag. Startup time reduced from ~60s to ~45s. ([#1111](#1111), [#1232](#1232)) by @paul-nechifor
- **RPC rework** — Renamed `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream`. Added `ModuleRef`, improved type hints throughout. ([#1143](#1143)) by @jeff-hykin
- **Image class simplification** — Rewritten as pure NumPy dataclass. Removed CUDA backend, unused methods (solve_pnp, csrt_tracker), and image_impls/ directory. ([#1161](#1161)) by @leshy
- **Odometry message cleanup** — Simplified Odometry message type. ([#1256](#1256)) by @leshy
- **Remove all ROS message dependencies** — Purged ROS message types from core DimOS. Refactored rosnav to use ROSTransport. Removed dead ROS bridge code. ([#1230](#1230)) by @alexlin2
- **Removed bad function serialization** — Eliminated unnecessary serialization of Python functions. ([#1121](#1121)) by @paul-nechifor
- **Benchmark IEC units** — Switched bandwidth benchmarks from SI to IEC units for accuracy. ([#1147](#1147)) by @leshy
- **Pubsub typing improvements** — Thread-safety locks on `subscribe_new_topics` and `subscribe_all`. Proper type params across pubsub stack. ([#1153](#1153)) by @leshy
- **Autogenerated blueprint list** — Blueprints are now auto-discovered and listed. ([#1100](#1100)) by @paul-nechifor
- **Generic Buttons message** — Renamed `QuestButtons` to `Buttons` with generic field names for cross-platform teleop. ([#1261](#1261)) by @ruthwikdasyam
- **Dev container uses ros-dev image** — `./bin/dev` now runs the ROS-enabled dev image. ([#1170](#1170)) by @leshy
- **LSP support** — Added python-lsp-server and python-lsp-ruff to dev dependencies. ([#1169](#1169)) by @leshy
- **Lazy-load pyrealsense2** — RealSense camera module uses lazy imports to avoid errors in simulation environments without the SDK. ([#1309](#1309)) by @spomichter
- **Removed unused mmcv and mmengine** — Dead Detic dependencies removed, eliminating slow source builds from install. ([#1319](#1319)) by @spomichter
- **Simplified installation** — Removed direnv requirement, streamlined install instructions across all platforms. ([#1315](#1315)) by @spomichter
- **DDS extra excluded from --all-extras** — `cyclonedds` requires a source build, so `dds` is now excluded from `uv sync --all-extras` by default. ([#1318](#1318)) by @spomichter
- **Nix pre-commit skip** — Skip pre-commit install if hooks already exist. ([#1162](#1162)) by @leshy
- **Removed base-requirements** — Consolidated dependency management. ([#1098](#1098)) by @paul-nechifor
- **Removed old graspnet** — Cleaned up deprecated graspnet version. ([#1248](#1248)) by @paul-nechifor
- **Code cleanup** — Removed `tofix` markers ([#1216](#1216)), fixed ruff issues ([#1112](#1112)), removed old README_installation.md ([#1101](#1101)) by @paul-nechifor

---

## 🐛 Bug Fixes

- Fix LFS updating (move from .local to venv) ([#1090](#1090)) by @jeff-hykin
- Launch hotfixes: git clone HTTPS, get_data main branch ([#1091](#1091)) by @spomichter
- Fix camera demo not showing in Rerun ([#1148](#1148)) by @jeff-hykin
- Default to rerun native viewer ([#1099](#1099)) by @Nabla7
- Fix exploration blocking agent loop ([#1258](#1258)) by @paul-nechifor
- Fix person-follow blocking agent loop ([#1278](#1278)) by @paul-nechifor
- Skip metric3d tests on unsupported xformers GPUs (Blackwell compute capability >9.0) ([#1225](#1225)) by @leshy
- Fix manipulation tests ([#1218](#1218), [#1247](#1247)) by @jeff-hykin, @paul-nechifor
- Fix control coordinator e2e test ([#1212](#1212)) by @mustafab0
- Fix xarm7-sim broken e2e tests ([#1294](#1294)) by @paul-nechifor
- Pin langchain to restore supported providers ([#1241](#1241)) by @spomichter
- Fix missing library dependencies in Nix flake ([#1240](#1240)) by @Kaweees
- Fix discord invite link ([#1122](#1122)) by @spomichter
- macOS edgecase fix ([#1096](#1096)) by @jeff-hykin
- Fix second N in logo ([#1250](#1250)) by @jeff-hykin
- Fix Unitree Go2 minor issues ([#1307](#1307)) by @paul-nechifor
- Fix broken tests ([#1305](#1305)) by @ruthwikdasyam
- Fix `uv sync` for some macOS systems ([#1322](#1322)) by @jeff-hykin
- Fix mmcv install ([#1313](#1313)) by @paul-nechifor
- Fix mypy issues ([#1150](#1150), [#1167](#1167), [#1257](#1257)) by @leshy, @paul-nechifor, @jeff-hykin
- Fix Nix install uv pip extras ([#1321](#1321)) by @spomichter

---

## 📚 Documentation

- **Major docs overhaul** — New README with feature grid, hardware table, quickstart. Navigation, transports, data streams, and agent docs. ([#1295](#1295)) by @leshy
- **Day 1 docs** — Comprehensive getting started guide, development docs, contributing guide, architecture overview. Executable blueprint docs via md-babel-py. ([#1064](#1064)) by @jeff-hykin
- **Arm integration guide** — How-to for integrating new robotic arms with DimOS. ([#1238](#1238)) by @mustafab0
- **MCP documentation update** — Updated MCP install and usage instructions. ([#1251](#1251)) by @Kaweees
- **Docker docs** — First pass on Docker deployment documentation. ([#1151](#1151)) by @leshy
- **Transports documentation** — Encode/decode mixins, SHM examples, ROS/DDS transport docs. ([#1107](#1107)) by @leshy
- **Rerun API examples** — Updated examples for the new RerunBridge API. ([#1262](#1262)) by @jeff-hykin
- **PR template** added ([#1172](#1172)) by @christiefhyang
- **Simplified install instructions** — Removed direnv, streamlined across all platforms. ([#1315](#1315)) by @spomichter
- **Python example restored** — Added back the Python usage example. ([#1317](#1317)) by @jeff-hykin
- **Nix install updated** — Replaced uv with pip for Nix compatibility. ([#1326](#1326)) by @ruthwikdasyam
- **README improvements** ([#1311](#1311)) by @paul-nechifor
- **Simplified writing docs** — Consolidated writing_docs to a single markdown file. ([#1254](#1254)) by @jeff-hykin

---

## 🏗️ CI & Build

- **ci-complete gate** — Dynamic branch protection via single aggregated status check. MD-only PRs no longer blocked. ([#1279](#1279)) by @spomichter
- **Path-based test filtering** — Test jobs fully skip (no container spin-up) when no relevant code changed. ([#1284](#1284), [#1286](#1286)) by @spomichter
- **Navigation docker build workflow** — CI builds for the ROS navigation stack. ([#1259](#1259)) by @spomichter
- **CUDA test marker** — `@pytest.mark.cuda` for GPU-dependent tests. ([#1220](#1220)) by @jeff-hykin
- **e2e test marker** — Marked end-to-end tests for selective CI runs. ([#1110](#1110)) by @paul-nechifor
- **pytest stdin fix** — Added `-s` to default addopts for LCM autoconf compatibility. ([#1320](#1320)) by @spomichter

---

## ⚠️ Breaking Changes

- **RPC renames**: `ModuleBlueprint` → `_BlueprintAtom`, `ModuleBlueprintSet` → `Blueprint`, `ModuleConnection` → `Stream` ([#1143](#1143))
- **Image class rewrite**: `CudaImage` and `NumpyImage` removed. Image is now a pure NumPy dataclass. Methods like `solve_pnp`, `csrt_tracker`, `from_depth`, `to_depth_meters` removed. ([#1161](#1161))
- **ROS messages removed from core**: All `to_ros`/`from_ros` conversion methods removed. Use `ROSTransport` instead. ([#1230](#1230))
- **QuestButtons → Buttons**: Renamed with generic field names. ([#1261](#1261))
- **RerunBridge replaces old rerun init**: `dimos.dashboard.rerun_init` removed. Use `RerunBridgeModule` or the `rerun-bridge` CLI. ([#1154](#1154))
- **Unitree directory restructuring**: `unitree_go2` → `unitree/go2`, `unitree_g1` → `unitree/g1`. Blueprint names updated. ([#1221](#1221))
- **Default viewer is now rerun-web**: Use `--viewer-backend rerun` to restore native viewer. ([#1324](#1324))

---

## Quickstart

```bash
# Install
uv pip install dimos[base,unitree]

# Try it (no hardware needed)
# NOTE: First run downloads ~2.4 GB from LFS
dimos --replay run unitree-go2

# Simulate
uv pip install dimos[base,unitree,sim]
dimos --simulation run unitree-go2
```

---

## New Contributors 🎉

- @ruthwikdasyam — Quest VR teleoperation, phone teleop, arm teleop, webcam rerun viz
- @JalajShuklaSS — GraspGen integration
- @jca0 — MuJoCo simulation module
- @christiefhyang — PR template

---

**Full Changelog**: [v0.0.9...v0.0.10](v0.0.9...v0.0.10)
docs(go2): add getting started guide (#1339)
Release v0.0.11

82 PRs, 10 contributors, 396 files changed.

This release brings a production CLI, MCP tooling, temporal memory, and first-class support for coding agents. Dask has been removed. The entire stack now runs from `dimos run` through `dimos stop`.

### Agent-Native Development

DimOS is now built to be driven by coding agents. Point OpenClaw, Claude Code, or Cursor at [AGENTS.md](AGENTS.md) and they can build, run, and debug Dimensional applications using the CLI and MCP interfaces directly.

- **AGENTS.md** — comprehensive onboarding doc: architecture, CLI reference, skill rules, blueprint quick-reference. Your agent reads this and starts coding.
- **MCP server** — all `@skill` methods exposed as HTTP tools. External agents call `dimos mcp call relative_move --arg forward=0.5` or connect via JSON-RPC.
- **MCP CLI** — `dimos mcp list-tools`, `dimos mcp call`, `dimos mcp status`, `dimos mcp modules`
- **Agent context logging** — MCP tool calls and agent messages logged to per-run JSONL for debugging and replay.

### CLI & Daemon

Full process lifecycle — no more Ctrl-C in tmux.

- `dimos run --daemon` — background execution with health checks and run registry
- `dimos stop [--force]` — graceful shutdown with SIGTERM → SIGKILL fallback
- `dimos restart` — replays the original CLI arguments
- `dimos status` — PID, blueprint, uptime, MCP port
- `dimos log -f` — structured per-run logs with follow, JSON output, filtering
- `dimos show-config` — resolved GlobalConfig with source tracing

### Temporal-Spatial Memory

Robots in physical space ingest hours of video and lidar. Temporal-spatial memory gives them a human-like understanding of the world — causal object relationships, entity tracking through time and physical space, and the ability to answer complex temporal queries:

*Who spends the most time in the kitchen? What time on average do I wake up? Which set of switches toggles the main lights? Who was at the office at 9am last Thursday?*

Traditional frame-level embeddings (CLIP, ViT) lose temporal context and don't scale beyond a handful of frames. Video transformers are expensive and don't operate in RGB-D. Dimensional agents work with video + lidar natively, tracking entities across hours and days.

```bash
dimos --replay --replay-dir unitree_go2_office_walk2 run unitree-go2-temporal-memory
```

### Interactive Viewer

Custom Rerun fork (`dimos-viewer`) is now the default. Click-to-navigate: click a point in the 3D view → PointStamped → A* planner → robot moves.

- Camera | 3D split layout on Go2, G1, and drone blueprints
- Native keyboard teleop in the viewer
- `--viewer rerun|rerun-web|rerun-connect|foxglove|none`

### Drone Support

Drone blueprints modernized to match Go2 composition pattern. `drone-basic` and `drone-agentic` work with replay, Rerun, and the full CLI.

```bash
dimos --replay run drone-basic
dimos --replay run drone-agentic
```

### More

- **Go2 fleet control** — multi-robot with `--robot-ips` (#1487)
- **Replay `--replay-dir`** — select dataset, loops by default (#1519, #1494)
- **Interactive install** — `curl -fsSL .../install.sh | bash` (#1395)
- **Nix on non-Debian Linux** (#1472)
- **Remove Dask** — native worker pool (#1365)
- **Remove asyncio dependency** (#1367)
- **Perceive loop** — continuous observation module for agents (#1411)
- **Worker resource monitor** — `dtop` TUI (#1378)
- **G1 agent wiring fix** (#1518)
- **Rerun rate limiting** — prevents viewer OOM on continuous streams (#1509, #1521)
- **RotatingFileHandler** — prevents unbounded log growth (#1492)
- **Test coverage** (#1397), draft PR CI skip (#1398), manipulation test fixes (#1522)

### Breaking Changes

- `--viewer-backend` renamed to `--viewer`
- Dask removed — blueprints using Dask workers need migration to native worker pool
- Default viewer changed from `rerun-web` to `rerun` (native dimos-viewer)

### Contributors

@spomichter, @PaulNechifor, @ruthwikdasyam, @summeryang, @MustafaBhadsorawala, @leshy, @sambull, @JeffHykin, @RadientBrain

## Contributor License Agreement

- [x] I have read and approved the [CLA](https://github.com/dimensionalOS/dimos/blob/main/CLA.md).
chore: bump version to 0.0.11 (#1529)
docs: README additions: temporal-memory fix, formatting (#1535)
docs(readme): add Trendshift trending badge
Add a generic file change detection utility (dimos/utils/change_detect.py)
that tracks content hashes via xxhash and integrate it into NativeModule so
it can automatically rebuild when watched source files change.

- change_detect.did_change() hashes file content, stores per-cache-name
  hash files in the venv, and returns True when files differ
- NativeModuleConfig gains rebuild_on_change: list[str] | None
- NativeModule._maybe_build() deletes stale executables when sources change
- 11 tests for change_detect, 3 integration tests for native rebuild
…avoid unlinking Nix store executables

- Add `cwd` parameter to `did_change()` and `_resolve_paths()` so relative
  glob patterns in `rebuild_on_change` are resolved against the module's
  working directory instead of the process cwd.
- Replace `exe.unlink()` with a `needs_rebuild` flag so executables that
  live in read-only locations (e.g. Nix store) are not deleted; instead
  the build command is re-run which handles the output path itself.
@jeff-hykin jeff-hykin marked this pull request as draft March 17, 2026 22:56
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR adds a content-hash-based file change detection utility (dimos/utils/change_detect.py) and integrates it into NativeModule so native executables automatically rebuild when watched source files change. It also fixes a thread-leak ordering bug in NativeModule.stop(), improves LCM test isolation with dedicated multicast addresses, and resolves the missing xxhash dependency.

Key changes and findings:

  • xxhash dependency added to pyproject.toml — resolves the previously-flagged missing dependency.
  • Thread-leak fixsuper().stop() (which joins the asyncio loop thread) is now correctly called before _process = None, preventing a race condition in CI.
  • LCM test isolationtest_lcmpubsub.py and test_pattern_sub.py now use dedicated multicast groups (239.255.76.98:7698 and 239.255.76.99:7699) to prevent cross-test contamination.
  • mesh_utils.py regressiondid_change is called without cwd using str(urdf_path). If urdf_path is relative (the function signature accepts Path | str with no absolute constraint), _resolve_paths raises ValueError. Previously the mtime-based approach handled relative paths transparently via the process CWD.
  • Two issues flagged in prior review rounds remain open: the pre-build did_change call writes the new hash before confirming build success (stale-build-on-failure scenario), and the post-build seeding call does not forward cwd for relative path sets.

Confidence Score: 3/5

  • Core feature is functional on the happy path but has known reliability gaps: a build failure leaves the cache in a state that permanently suppresses future rebuild attempts, and a new regression in mesh_utils.py breaks relative URDF paths.
  • Several improvements from prior rounds are addressed (xxhash dependency, thread-leak, LCM isolation), but the two central reliability bugs in the rebuild-on-change feature (cache written before build success, missing cwd in post-build seed call) remain unaddressed from earlier review rounds. The new relative-path regression in mesh_utils.py adds one more concrete fix required before merge.
  • dimos/core/native_module.py (pre-build hash write / missing cwd in post-build seed) and dimos/manipulation/planning/utils/mesh_utils.py (relative urdf_path raises ValueError)

Important Files Changed

Filename Overview
dimos/utils/change_detect.py New content-hash change detection utility using xxhash. Core logic is sound; thread + file locking implemented correctly. The known design trade-off (cache always written on call, not just on success) is the root cause of the previously-flagged build-failure regression, but is otherwise well-documented.
dimos/core/native_module.py Integrates rebuild-on-change into _maybe_build. Thread-leak fix (super().stop() before _process=None) is correct. Two previously-flagged issues remain open: the pre-build did_change call writes the hash before confirming build success, and the post-build seeding call (line 314) does not pass cwd, both already tracked in earlier review rounds.
dimos/manipulation/planning/utils/mesh_utils.py Switches URDF cache invalidation from mtime-in-cache-key to did_change. Introduces a behavioral regression: relative urdf_path values now raise ValueError because did_change is called without cwd, whereas the previous mtime approach worked with relative paths.
dimos/core/test_native_rebuild.py New test file covering happy-path rebuild-on-change scenarios. Uses autouse tmp_path cache redirect for proper isolation. All tests use try/finally to call mod.stop().
dimos/utils/test_change_detect.py Good unit coverage for change_detect.py. One test (test_nonexistent_path_warns) captures caplog but never asserts its contents, previously flagged in an earlier round.
dimos/protocol/pubsub/impl/test_lcmpubsub.py LCM test isolation fix using a dedicated multicast address — clean and correct improvement.
dimos/protocol/pubsub/test_pattern_sub.py Same LCM isolation fix as test_lcmpubsub.py, using a separate isolated multicast group.
pyproject.toml Adds xxhash>=3.0.0 to project dependencies, resolving the missing dependency flagged in a prior review round.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[NativeModule._maybe_build] --> B{rebuild_on_change\nset AND exe exists?}
    B -- No --> C{exe exists?}
    B -- Yes --> D["did_change(cache_name, paths, cwd)\n⚠ writes hash to cache immediately"]
    D -- False\nno change --> C
    D -- True\nfiles changed --> E[needs_rebuild = True\nlog 'Source files changed']
    C -- Yes and not needs_rebuild --> F[return early\nno build]
    C -- No OR needs_rebuild --> G{build_command set?}
    G -- No --> H[raise FileNotFoundError]
    G -- Yes --> I[subprocess.Popen build_command]
    I --> J{returncode == 0?}
    J -- No --> K[raise RuntimeError\n⚠ cache already updated\nfuture rebuilds blocked]
    J -- Yes --> L{exe exists\nafter build?}
    L -- No --> M[raise FileNotFoundError]
    L -- Yes --> N["did_change(cache_name, paths, cwd)\nseed cache post-build\n⚠ cwd arg present here"]
    N --> O[build complete]
Loading

Reviews (2): Last reviewed commit: "fix(test): isolate LCM multicast in flak..." | Re-trigger Greptile

jeff-hykin and others added 6 commits March 18, 2026 17:44
* removed redundant rerun teleop methods

* teleop blueprints rename

* pre-commit fixes

* fix: phone teleop import

* fix: comments
* event based sub callback collector for tests

* shorter wait for no msg

* fix(tests): raise AssertionError on CallbackCollector timeout

Instead of silently returning when messages never arrive, wait() now
raises with a clear message showing expected vs received count.
* feat: adding arm_ip and can_port to env

* feat: using env variables in blueprints

* arm_ip env variables

* misc: control blueprints cleanup

* refactor: hardware factories

* fix: pre-commit checks

* fix: gripper check + comments

* fix: gripper addition

* fix: no init needed, blueprint path

* CI code cleanup

* check trigger commit

* fix: unwanted changes

* fix: blueprint path

* fix: remove duplicates

* feat: env var from globalconfig
@jeff-hykin jeff-hykin force-pushed the jeff/feat/native_rebuild branch from bfab461 to e01688c Compare March 19, 2026 21:09
paul-nechifor and others added 14 commits April 5, 2026 06:48
If the build_command changes (e.g. pointing at a new flake URL or version),
the native module should rebuild even if source files haven't changed.
extra_hash folds an arbitrary string into the file hash so any change to it
invalidates the cache.
Passes extra_hash=self.config.build_command to did_change() and
update_cache() so that changing the build command (e.g. a new flake
URL or version tag) triggers a rebuild even if source files are
unchanged.
Allows subclasses to map Python field names to different CLI arg names
passed to the native binary. E.g. robot_dimension -> robot_dim. Without
this, the field name is passed as-is.
Co-authored-by: Sam Bull <Sam.B@snowfalltravel.com>
Co-authored-by: Paul Nechifor <paul@nechifor.net>
@jeff-hykin jeff-hykin force-pushed the jeff/feat/native_rebuild branch from 7987aba to bc92dc3 Compare April 12, 2026 01:49
@jeff-hykin
Copy link
Copy Markdown
Member Author

closing in favor of #1780 cause of needing to rewrite history thanks to claude commits

@jeff-hykin jeff-hykin closed this Apr 12, 2026
auto-merge was automatically disabled April 12, 2026 02:40

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants