Skip to content

Fix: skip local ready buffer on PENDING dispatch to reduce same-thread re-claim#623

Open
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54:sched-policy
Open

Fix: skip local ready buffer on PENDING dispatch to reduce same-thread re-claim#623
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54:sched-policy

Conversation

@zhusy54
Copy link
Copy Markdown
Contributor

@zhusy54 zhusy54 commented Apr 21, 2026

Summary

  • Add skip_local parameter to PTO2SchedulerState::get_ready_tasks_batch() and AicpuExecutor::pop_ready_tasks_batch()
  • When a thread dispatches PENDING tasks (is_pending == true), it now passes skip_local=true to avoid immediately re-consuming tasks it just released into the local ready buffer
  • This gives newly-released downstream tasks a window to be flushed to the global queue and picked up by other threads, preventing greedy same-thread re-claim

Testing

  • All tensormap_and_ringbuffer L3 hardware tests passed (test_l3_child_memory, test_l3_group, test_l3_dependency)
  • Pre-commit hooks passed (check-headers, clang-format, clang-tidy, cpplint)

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a skip_local parameter to the task scheduler to improve task distribution across threads by allowing a window for local tasks to be flushed. It also refactors the pop_ready_tasks_batch function to accept profiling counters as references. A critical issue was identified in the profiling logic where the pop_hit counter is incremented by one per batch instead of the actual number of tasks popped, which will result in incorrect performance metrics.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated
@zhusy54 zhusy54 force-pushed the sched-policy branch 3 times, most recently from 7f844b0 to d04ca1b Compare April 22, 2026 06:03
…d re-claim

Add skip_local parameter to get_ready_tasks_batch (and pop_ready_tasks_batch)
so that when a thread is dispatching PENDING tasks, it does not immediately
re-consume tasks it just released into its local ready buffer. This gives
downstream tasks a window to be picked up by other threads rather than being
greedily re-claimed by the same thread's PENDING slots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant