Adopt latest BS_thread_pool libary version v5.1.0#2053
Conversation
|
Thanks for the changes By default we use OPENMP_RUNTIME=INTEL which does not execute this code path, including in the unit test. Can you please build with -DOPENMP_RUNTIME=NONE and do a benchmark to see that there is no regression in output or performance? |
Hi, apart from |
BenchmarksResults (cpu, float32)
Summary
Build & run specificscmake -DCMAKE_INSTALL_PREFIX=$PWD/install -DBUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release -DOPENMP_RUNTIME=NONE ..for op in gather transpose split layer_norm softmax masked_softmax topk dequantize conv1d median_filter; do echo "=== benchmark_ops cpu $op float32 ==="; tests/benchmark_ops $op cpu; doneBenchamrk FixI also applied a fix on |
|
Thanks so much |
BS_thread_pool
Adopt
BS_thread_pool @version 5.1.0and the update relevant code.A noticeable feature in the pool are the various flavors one can use in the respective ctor. That is:
Thread pool
light_thread_poolwas currently adopted.ReplicaPool
Regarding
include/ctranslate2/thread_pool.h&include/ctranslate2/replica_pool.h:It doesn't make sense to replace
ReplicaPoolwithBS::thread_pooldirectly. TheCTranslate2design is intentionally tighter than a general thread pool — the coupling between thread identity and model replica is load-bearing for GPU correctness and memory management.Where
BS::thread_poolwould make sense is if we were replacing the internalThreadPoolprimitive (the lower-level classReplicaPoolbuilds on), but we'd still need to re-implement the worker lifecycle on top of it. The gain would be modest (BS has nicer ergonomics and optional task priorities), but the migration cost would be non-trivial.Benchmark
#2053 (comment)
resolves #2052