InsertSync: support explicit subview multibuffer autosync#517
InsertSync: support explicit subview multibuffer autosync#517TaoTao-real wants to merge 22 commits intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements auto-sync support for ping/pong buffers derived from a single root workspace via pto.subset or memref.subview. It introduces the PTOEnableMultiBuffer pass to materialize buffer selection and adds dynamic synchronization operations (SetFlagDynOp, WaitFlagDynOp) to handle loop-carried dependencies. The InsertSync analysis is enhanced with slot-aware logic to prove non-overlap between buffer partitions. Review feedback suggests deduplicating interval overlap logic, removing a redundant (void) cast, and optimizing SmallVector capacity in the loop nest traversal.
| auto isOverlap = [](const BaseMemInfo *a, const BaseMemInfo *b, int i, | ||
| int j) -> bool { | ||
| uint64_t aStart = a->baseAddresses[static_cast<size_t>(i)]; | ||
| uint64_t bStart = b->baseAddresses[static_cast<size_t>(j)]; | ||
| uint64_t aEnd = aStart + a->allocateSize; | ||
| uint64_t bEnd = bStart + b->allocateSize; | ||
| uint64_t maxStart = std::max(aStart, bStart); | ||
| uint64_t minEnd = std::min(aEnd, bEnd); | ||
| return maxStart < minEnd; | ||
| }; |
|
|
||
| bool InsertSyncAnalysis::AreSlotwiseNonOverlapping( | ||
| const DepBaseMemInfoPairVec &depBaseMemInfosVec, int factor) const { | ||
| (void)factor; |
| Value one = rewriter.create<arith::ConstantIndexOp>(loc, 1); | ||
|
|
||
| // Collect loop nest from inner to outer (baseLoop, parent, ...). | ||
| SmallVector<scf::ForOp> loops; |
Codex Review该评论由 review 机器人自动更新。
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
8bb9f55 to
5c0c3d6
Compare
- Add pto.multi_buffer=2 attr plumbing into PlanMemory (alloc_tile -> memref.alloc). - Detect ping/pong via planned address overlap-matrix and emit dynamic event-id set/wait. - Add EnableMultiBuffer pass to materialize loop-local ping/pong selection. - Add Sync sample + runop guard; fix PTOViewToMemref typed accessor crash for bitcast/treshape.
emitc.call_opaque requires an IntegerAttr placeholder to print SSA operands. Add the operand placeholder for event_id so set_flag/wait_flag receive the dynamic event argument, and extend the Sync multibuf runop guard to catch missing 3rd argument.
- Track view-like alias closures (bind_tile/subview/casts) from multi-address pointer_cast. - Build ping/pong selector in the LCA loop and rematerialize loop-local alias ops so tile allocations also switch ping/pong. - Update multibuf pingpong sample to use alloc_tile + TLOAD/TSTORE so the generated C++ builds on A2/A3 pto-isa.
Materialize ping/pong by selecting between i64 addresses and building a loop-local PointerCastOp. This keeps bind_tile lowering able to trace the defining PointerCastOp and avoids generating C++ that casts Tile<> to __ubuf__ pointers (which breaks A2/A3 compilation). Update the multibuf runop guard accordingly.
The ping/pong base addresses are an implementation detail of PlanMemory. Check for >=2 distinct int64 constants + ternary address selection, rather than hard-coding 0/512.
a800533 to
2d1e338
Compare
…fer-pingpong-slotaware
|
补充说明:这轮我重新逐个检查了当前所有 multibuffer 相关 case,不只是看
结论
已真正形成并行化同步的 case
正确回退、没有形成 multibuffer 并行化的 case
一个容易混淆的点
这次检查覆盖的是当前 PR 范围内所有 multibuffer 相关 basic/sample case,结论可以概括为:
|
Summary
pto.subview/ loweredmemref.subviewleaf buffers.(root, group, slot)metadata so ping/pong and N-slot selector patterns reuse the correct event id across iterations.Motivation
pto.multi_bufferintent,EnableMultiBufferauto expansion/materialization, or heuristic double-alloc inference.Design
docs/designs/ptoas-auto-sync-subview-multibuffer-v1.mddocs/designs/multibuffer-root-group-slot-demo.ptopto.subview/memref.subviewleaves participate in slot-aware multibuffer.pto.multi_buffer_factor,pto.multi_buffer_slot, optionalpto.multi_buffer_group.EnableMultiBuffer-style pass is included in this PR.Testing
source ~/Workspace/PTO/env.sh && ninja -C build ptoasmain:bash test/samples/runop.sh -t Syncbash test/samples/runop.sh --enablebc -t Synctest/basic/issue517_subset_slot_binding_regression.ptotest/lit/pto/issue428_cube_sync_regression.ptotest/lit/pto/issue454_nested_loop_same_pipe_pair_regression.ptomultibuffer_subset_pingpong_a3.py), but A3 remote validation has not been rerun after the final scope-cleanup + rebase step.Risk / Rollback
Review Focus
PTOIRTranslatorpropagation/validation of explicit subview multibuffer metadata.MemoryDependentAnalyzer/InsertSyncAnalysisuse of(root, group, slot)for dependency pruning and event-lane decisions.SyncEventIdAllocation/SyncCodegencorrectness for selector-mode dynamic event-id binding.