fix(provider): resolve vLLM embedding compatibility and dimension inf…#7509
Open
Creeper3222 wants to merge 2 commits intoAstrBotDevs:masterfrom
Open
fix(provider): resolve vLLM embedding compatibility and dimension inf…#7509Creeper3222 wants to merge 2 commits intoAstrBotDevs:masterfrom
Creeper3222 wants to merge 2 commits intoAstrBotDevs:masterfrom
Conversation
Contributor
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
AstrBotConfig.vue, the line[已禁用] 不再自动写入配置文件,仅显示提示is not commented out and will cause a syntax error in the script block; it should either be removed or turned into a proper comment. - The vLLM detection based on error message substrings like
"matryoshka"/"dimensions"inopenai_embedding_source.pyis brittle; consider centralizing this logic and tightening the checks (e.g., by inspecting error types or status codes) to avoid misclassifying unrelated 400 errors. - In
config.py#get_embedding_dim, returning the string sentinel"vLLM-Adaptive"inembedding_dimensionschanges the expected type from numeric to string; double-check that all consumers (not just this Vue component) can safely handle this non-numeric value or introduce a separate flag for vLLM compatibility mode.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `AstrBotConfig.vue`, the line `[已禁用] 不再自动写入配置文件,仅显示提示` is not commented out and will cause a syntax error in the script block; it should either be removed or turned into a proper comment.
- The vLLM detection based on error message substrings like `"matryoshka"`/`"dimensions"` in `openai_embedding_source.py` is brittle; consider centralizing this logic and tightening the checks (e.g., by inspecting error types or status codes) to avoid misclassifying unrelated 400 errors.
- In `config.py#get_embedding_dim`, returning the string sentinel `"vLLM-Adaptive"` in `embedding_dimensions` changes the expected type from numeric to string; double-check that all consumers (not just this Vue component) can safely handle this non-numeric value or introduce a separate flag for vLLM compatibility mode.
## Individual Comments
### Comment 1
<location path="dashboard/src/components/shared/AstrBotConfig.vue" line_range="114" />
<code_context>
if (response.data.status != "error" && response.data.data?.embedding_dimensions) {
console.log(response.data.data.embedding_dimensions)
- providerConfig.embedding_dimensions = response.data.data.embedding_dimensions
+ [已禁用] 不再自动写入配置文件,仅显示提示
+ // providerConfig.embedding_dimensions = response.data.data.embedding_dimensions
useToast().success("获取成功: " + response.data.data.embedding_dimensions)
</code_context>
<issue_to_address>
**issue (bug_risk):** The `[已禁用]` line is not valid JavaScript and will break the component.
Because this is raw text in the function body, it makes the script invalid and will cause a syntax error. If it’s meant to be a comment, prefix it with `//` or wrap it in `/* ... */`, e.g. `// [已禁用] 不再自动写入配置文件,仅显示提示`.
</issue_to_address>
### Comment 2
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="68-69" />
<code_context>
+ if "vllm" in api_base:
+ logger.info(f"[OpenAI Embedding] Detected vLLM by api_base keyword")
+ return True
+ # 方法3:检查常见的vLLM端口(8000, 8001等)
+ if ":8000" in api_base or ":8001" in api_base or ":8002" in api_base:
+ logger.info(f"[OpenAI Embedding] Detected vLLM by common port in api_base: {api_base}")
+ return True
</code_context>
<issue_to_address>
**question (bug_risk):** Port-based vLLM detection may cause false positives for non-vLLM services on these ports.
Using ports 8000–8002 as a vLLM signal can wrongly treat any OpenAI-compatible service on those ports as vLLM, silently skipping the `dimensions` parameter and altering behavior. Please narrow this heuristic (for example by combining it with hostname patterns or a dedicated config flag) to avoid misclassification.
</issue_to_address>
### Comment 3
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="93-97" />
<code_context>
+ # 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
+ # 尝试不带dimensions重试
+ error_msg = str(e).lower()
+ if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
+ logger.warning(
+ f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}"
+ )
+ kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
+ try:
+ embedding = await self.client.embeddings.create(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Error-string–based vLLM detection is brittle and may misclassify unrelated errors.
Relying on generic substrings like "dimensions" means any error that happens to include that word will trigger the vLLM fallback and potentially mask real problems. Consider matching more specific vLLM error text or codes, or making this behavior configurable so users can explicitly opt in when using vLLM.
Suggested implementation:
```python
except Exception as e:
# 针对 vLLM 不支持 dimensions 参数的兼容逻辑:
# 1. 仅在显式开启配置时才启用(避免误伤其它错误场景)
# 2. 使用更具体的错误片段进行匹配,而不是泛化的 "dimensions"
error_msg = str(e).lower()
# 是否开启 vLLM dimensions 兼容重试逻辑(默认关闭)
retry_on_dimensions_error = getattr(self, "retry_on_dimensions_error", False)
if retry_on_dimensions_error and kwargs.get("dimensions"):
vllm_dimensions_error_markers = (
"matryoshka embedding",
"matryoshka",
"does not support the 'dimensions' parameter",
"does not support the \"dimensions\" parameter",
"dimensions is not supported for this model",
"unsupported dimensions parameter",
)
if any(marker in error_msg for marker in vllm_dimensions_error_markers):
logger.warning(
"[OpenAI Embedding] Detected possible vLLM 'dimensions' incompatibility, "
"retrying without dimensions parameter: %s",
e,
)
kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
embedding = await self.client.embeddings.create(
input=text,
model=self.model,
**kwargs_retry,
)
return embedding.data[0].embedding
# 未命中 vLLM 兼容规则或未开启配置,直接抛出原始异常,避免隐藏真实错误
raise
```
1. 在对应的 provider/source 类(定义 `self.client` 和 `self.model` 的类)上新增一个配置属性,例如:
- 在 `__init__` 中增加 `self.retry_on_dimensions_error = config.get("retry_on_dimensions_error", False)`,或从全局配置/环境变量读取。
2. 如果项目有统一的配置系统,建议在文档和默认配置中加入说明:
- 该开关仅在使用 vLLM 且遇到 dimensions 不兼容时需要开启;
- 默认保持为 `False`,以避免误将其它错误当作 vLLM 兼容问题而被掩盖。
</issue_to_address>
### Comment 4
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="79" />
<code_context>
+ self._is_vllm_detected = True
+ logger.info("[OpenAI Embedding] Marked as vLLM (runtime detection via error)")
async def get_embedding(self, text: str) -> list[float]:
"""获取文本的嵌入"""
kwargs = self._embedding_kwargs()
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting shared helpers for the retry-without-dimensions flow and dimension resolution so `get_embedding`/`get_embeddings` and related methods become simpler and less duplicated.
You can keep all current behavior while reducing duplication and making the control flow easier to follow by extracting two small helpers and centralizing dimension logic.
### 1. Extract shared retry-without-dimensions helper
`get_embedding` and `get_embeddings` have almost identical try/except logic. Move that into a single private async helper that takes `input` and a post-processing callback:
```python
async def _create_embeddings_with_dim_fallback(
self,
input_data: Any,
postprocess: Callable[[Any], Any],
) -> Any:
kwargs = self._embedding_kwargs()
try:
embeddings = await self.client.embeddings.create(
input=input_data,
model=self.model,
**kwargs,
)
return postprocess(embeddings)
except Exception as e:
error_msg = str(e).lower()
if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
logger.warning(
f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}"
)
kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
try:
embeddings = await self.client.embeddings.create(
input=input_data,
model=self.model,
**kwargs_retry,
)
logger.info(
"[OpenAI Embedding] Successfully retrieved embeddings without dimensions parameter, marking as vLLM"
)
self._mark_as_vllm()
return postprocess(embeddings)
except Exception as retry_error:
logger.error(
f"[OpenAI Embedding] Retry without dimensions also failed: {retry_error}"
)
raise retry_error
raise
```
Then `get_embedding` and `get_embeddings` become very simple:
```python
async def get_embedding(self, text: str) -> list[float]:
return await self._create_embeddings_with_dim_fallback(
text,
postprocess=lambda res: res.data[0].embedding,
)
async def get_embeddings(self, text: list[str]) -> list[list[float]]:
return await self._create_embeddings_with_dim_fallback(
text,
postprocess=lambda res: [item.embedding for item in res.data],
)
```
This keeps the exact retry behavior and logging, but removes duplicated branches.
### 2. Centralize dimension parsing / inference
`_embedding_kwargs` and `get_dim` both parse `embedding_dimensions` and log. You can centralize that into one helper that returns an `int | None` and is reused:
```python
def _resolve_dimensions(self) -> int | None:
provider_id = self.provider_config.get("id", "unknown")
embedding_dim_config = self.provider_config.get("embedding_dimensions", "")
if embedding_dim_config and embedding_dim_config != "":
try:
dim = int(embedding_dim_config)
if dim > 0:
logger.info(
f"[OpenAI Embedding] {provider_id}: Dimension from config: {dim}"
)
return dim
except (ValueError, TypeError):
logger.warning(
f"[OpenAI Embedding] {provider_id}: embedding_dimensions is not a valid integer: '{embedding_dim_config}'"
)
# fall through to model inference
model = self.provider_config.get("embedding_model", "").lower()
model_dims = {
"bge-m3": 1024,
"bge-large-en-v1.5": 1024,
"bge-large-zh-v1.5": 1024,
"text-embedding-3-small": 1536,
"text-embedding-3-large": 3072,
"text-embedding-ada-002": 1536,
}
for model_key, dim in model_dims.items():
if model_key in model:
logger.info(
f"[OpenAI Embedding] {provider_id}: Inferred dimension {dim} from model: {model}"
)
return dim
logger.warning(
f"[OpenAI Embedding] {provider_id}: Could not determine dimension (model: {model}, config: '{embedding_dim_config}')"
)
return None
```
Then:
```python
def _embedding_kwargs(self) -> dict:
kwargs: dict[str, Any] = {}
provider_id = self.provider_config.get("id", "unknown")
if self._is_vllm():
logger.info(
f"[OpenAI Embedding] {provider_id}: Detected vLLM, skipping dimensions parameter"
)
return kwargs
dim = self._resolve_dimensions()
if dim is not None:
kwargs["dimensions"] = dim
else:
logger.info(
f"[OpenAI Embedding] {provider_id}: No explicit/inferred embedding_dimensions, API will use default"
)
return kwargs
def get_dim(self) -> int:
dim = self._resolve_dimensions()
return dim or 0
```
This removes duplicated parsing, try/except, and logging, while preserving the current behavior (config first, then model inference, then fallback).
### 3. Slightly simplify vLLM detection state
You already have `_is_vllm_detected` plus heuristics; wrapping both into `_is_vllm()` is fine, but you can keep it narrowly focused and avoid repeating `.lower()`:
```python
def _is_vllm(self) -> bool:
if self._is_vllm_detected:
return True
provider_id = (self.provider_config.get("id", "") or "").lower()
api_base = (self.api_base_normalized or "").lower()
if "vllm" in provider_id:
logger.info(f"[OpenAI Embedding] Detected vLLM by provider id: {provider_id}")
return True
if "vllm" in api_base:
logger.info("[OpenAI Embedding] Detected vLLM by api_base keyword")
return True
if any(port in api_base for port in (":8000", ":8001", ":8002")):
logger.info(
f"[OpenAI Embedding] Detected vLLM by common port in api_base: {api_base}"
)
return True
return False
```
This keeps all your current heuristics and logging, but makes the method more compact and easier to scan.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Contributor
There was a problem hiding this comment.
Code Review
This pull request implements vLLM compatibility for the OpenAI embedding provider by detecting vLLM environments and omitting the unsupported 'dimensions' parameter. It adds runtime error detection, model-based dimension inference, and dashboard updates. A critical syntax error was found in the Vue component where raw text was added without comment markers, and the static port-based vLLM detection was noted as potentially too broad.
| if (response.data.status != "error" && response.data.data?.embedding_dimensions) { | ||
| console.log(response.data.data.embedding_dimensions) | ||
| providerConfig.embedding_dimensions = response.data.data.embedding_dimensions | ||
| [已禁用] 不再自动写入配置文件,仅显示提示 |
Contributor
Comment on lines
+69
to
+71
| if ":8000" in api_base or ":8001" in api_base or ":8002" in api_base: | ||
| logger.info(f"[OpenAI Embedding] Detected vLLM by common port in api_base: {api_base}") | ||
| return True |
Contributor
Creeper3222
commented
Apr 13, 2026
| @@ -51,24 +51,25 @@ def __init__(self, provider_config: dict, provider_settings: dict) -> None: | |||
| self._is_vllm_detected = False | |||
|
|
|||
| def _is_vllm(self) -> bool: | |||
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

我尝试将living memory的embedding模型bge-m3从使用ollama部署的改为用docker/vllm 部署的bge-m3,但是现在最新的vllm不再接受主动传入的向量维度作为参数来访问,主动传入向量维度作为参数会导致保错且测试不可用,而astrbot的embedding模型提供商得向量维度这一栏留空则可以测试成功,但又会导致living memory报错向量维度不一致导致无法重建索引。换回ollama是可以正常使用的。







我首先排查了是不是bge-m3版本不同导致的,经过脚本测试ollama和vllm都是从同一个镜像拉去的bge-m3:lastest
不存在不兼容,如图所示
所以我确定了是astrbot默认会传入dimensions参数给vllm的问题,我修正了AstrBot\astrbot\core\provider\sources\openai_embedding_source.py
AstrBot\astrbot\dashboard\routes\config.py
AstrBot\dashboard\src\components\shared\AstrBotConfig.vue
这三个文件,添加了检查embedding模型的提供商是否为vllm的逻辑,当判定提供商为vllm时不传dimensions参数,但会传给living memory验证维度是否相同,这样同时让living memory判定维度相同,又不传dimensions参数给vllm不会报错。


现在测试得到ollama和vllm都可以作为astrbot的embedding模型提供商
Summary by Sourcery
Improve embedding provider compatibility with vLLM by adapting dimension handling and enhancing dimension inference and UX.
New Features:
Bug Fixes:
Enhancements: