Skip to content

fix(provider): resolve vLLM embedding compatibility and dimension inf…#7509

Open
Creeper3222 wants to merge 2 commits intoAstrBotDevs:masterfrom
Creeper3222:fix/vllm-embedding-compatibility
Open

fix(provider): resolve vLLM embedding compatibility and dimension inf…#7509
Creeper3222 wants to merge 2 commits intoAstrBotDevs:masterfrom
Creeper3222:fix/vllm-embedding-compatibility

Conversation

@Creeper3222
Copy link
Copy Markdown

@Creeper3222 Creeper3222 commented Apr 13, 2026

我尝试将living memory的embedding模型bge-m3从使用ollama部署的改为用docker/vllm 部署的bge-m3,但是现在最新的vllm不再接受主动传入的向量维度作为参数来访问,主动传入向量维度作为参数会导致保错且测试不可用,而astrbot的embedding模型提供商得向量维度这一栏留空则可以测试成功,但又会导致living memory报错向量维度不一致导致无法重建索引。换回ollama是可以正常使用的。
d9ddafe9a6990e4a8f6cbb3040c1bfc2
78a8df753da82acc805fe24635c10151
4619df755e35c2ae640ac632678d6d99
3707facafe6c958df59ae5bddfe2cf8c
屏幕截图 2026-04-06 004935
屏幕截图 2026-04-06 005239
我首先排查了是不是bge-m3版本不同导致的,经过脚本测试ollama和vllm都是从同一个镜像拉去的bge-m3:lastest
不存在不兼容,如图所示
屏幕截图 2026-04-06 010145

所以我确定了是astrbot默认会传入dimensions参数给vllm的问题,我修正了AstrBot\astrbot\core\provider\sources\openai_embedding_source.py

AstrBot\astrbot\dashboard\routes\config.py

AstrBot\dashboard\src\components\shared\AstrBotConfig.vue

这三个文件,添加了检查embedding模型的提供商是否为vllm的逻辑,当判定提供商为vllm时不传dimensions参数,但会传给living memory验证维度是否相同,这样同时让living memory判定维度相同,又不传dimensions参数给vllm不会报错。
现在测试得到ollama和vllm都可以作为astrbot的embedding模型提供商
image
image

Summary by Sourcery

Improve embedding provider compatibility with vLLM by adapting dimension handling and enhancing dimension inference and UX.

New Features:

  • Automatically detect vLLM embedding backends and omit unsupported dimensions parameters while still supporting other OpenAI-style providers.
  • Infer embedding vector dimensions from common model names when they are not explicitly configured, enabling better Living Memory integration.

Bug Fixes:

  • Prevent vLLM embedding requests from failing due to unsupported dimensions parameters in both single and batch modes.
  • Avoid backend errors during embedding-dimension probing by returning a synthetic success response for known vLLM error patterns.

Enhancements:

  • Add runtime fallback logic to transparently retry embedding calls without dimensions when vLLM-specific errors are detected and cache that capability for subsequent calls.
  • Refine dashboard behavior to no longer auto-write detected embedding dimensions into provider configuration, instead surfacing them to the user for manual confirmation.

@auto-assign auto-assign bot requested review from Fridemn and LIghtJUNction April 13, 2026 09:13
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In AstrBotConfig.vue, the line [已禁用] 不再自动写入配置文件,仅显示提示 is not commented out and will cause a syntax error in the script block; it should either be removed or turned into a proper comment.
  • The vLLM detection based on error message substrings like "matryoshka"/"dimensions" in openai_embedding_source.py is brittle; consider centralizing this logic and tightening the checks (e.g., by inspecting error types or status codes) to avoid misclassifying unrelated 400 errors.
  • In config.py#get_embedding_dim, returning the string sentinel "vLLM-Adaptive" in embedding_dimensions changes the expected type from numeric to string; double-check that all consumers (not just this Vue component) can safely handle this non-numeric value or introduce a separate flag for vLLM compatibility mode.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `AstrBotConfig.vue`, the line `[已禁用] 不再自动写入配置文件,仅显示提示` is not commented out and will cause a syntax error in the script block; it should either be removed or turned into a proper comment.
- The vLLM detection based on error message substrings like `"matryoshka"`/`"dimensions"` in `openai_embedding_source.py` is brittle; consider centralizing this logic and tightening the checks (e.g., by inspecting error types or status codes) to avoid misclassifying unrelated 400 errors.
- In `config.py#get_embedding_dim`, returning the string sentinel `"vLLM-Adaptive"` in `embedding_dimensions` changes the expected type from numeric to string; double-check that all consumers (not just this Vue component) can safely handle this non-numeric value or introduce a separate flag for vLLM compatibility mode.

## Individual Comments

### Comment 1
<location path="dashboard/src/components/shared/AstrBotConfig.vue" line_range="114" />
<code_context>
     if (response.data.status != "error" && response.data.data?.embedding_dimensions) {
       console.log(response.data.data.embedding_dimensions)
-      providerConfig.embedding_dimensions = response.data.data.embedding_dimensions
+      [已禁用] 不再自动写入配置文件,仅显示提示
+      // providerConfig.embedding_dimensions = response.data.data.embedding_dimensions
       useToast().success("获取成功: " + response.data.data.embedding_dimensions)
</code_context>
<issue_to_address>
**issue (bug_risk):** The `[已禁用]` line is not valid JavaScript and will break the component.

Because this is raw text in the function body, it makes the script invalid and will cause a syntax error. If it’s meant to be a comment, prefix it with `//` or wrap it in `/* ... */`, e.g. `// [已禁用] 不再自动写入配置文件,仅显示提示`.
</issue_to_address>

### Comment 2
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="68-69" />
<code_context>
+        if "vllm" in api_base:
+            logger.info(f"[OpenAI Embedding] Detected vLLM by api_base keyword")
+            return True
+        # 方法3:检查常见的vLLM端口(8000, 8001等)
+        if ":8000" in api_base or ":8001" in api_base or ":8002" in api_base:
+            logger.info(f"[OpenAI Embedding] Detected vLLM by common port in api_base: {api_base}")
+            return True
</code_context>
<issue_to_address>
**question (bug_risk):** Port-based vLLM detection may cause false positives for non-vLLM services on these ports.

Using ports 8000–8002 as a vLLM signal can wrongly treat any OpenAI-compatible service on those ports as vLLM, silently skipping the `dimensions` parameter and altering behavior. Please narrow this heuristic (for example by combining it with hostname patterns or a dedicated config flag) to avoid misclassification.
</issue_to_address>

### Comment 3
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="93-97" />
<code_context>
+            # 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
+            # 尝试不带dimensions重试
+            error_msg = str(e).lower()
+            if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
+                logger.warning(
+                    f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}"
+                )
+                kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
+                try:
+                    embedding = await self.client.embeddings.create(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Error-string–based vLLM detection is brittle and may misclassify unrelated errors.

Relying on generic substrings like "dimensions" means any error that happens to include that word will trigger the vLLM fallback and potentially mask real problems. Consider matching more specific vLLM error text or codes, or making this behavior configurable so users can explicitly opt in when using vLLM.

Suggested implementation:

```python
        except Exception as e:
            # 针对 vLLM 不支持 dimensions 参数的兼容逻辑:
            # 1. 仅在显式开启配置时才启用(避免误伤其它错误场景)
            # 2. 使用更具体的错误片段进行匹配,而不是泛化的 "dimensions"
            error_msg = str(e).lower()

            # 是否开启 vLLM dimensions 兼容重试逻辑(默认关闭)
            retry_on_dimensions_error = getattr(self, "retry_on_dimensions_error", False)

            if retry_on_dimensions_error and kwargs.get("dimensions"):
                vllm_dimensions_error_markers = (
                    "matryoshka embedding",
                    "matryoshka",
                    "does not support the 'dimensions' parameter",
                    "does not support the \"dimensions\" parameter",
                    "dimensions is not supported for this model",
                    "unsupported dimensions parameter",
                )

                if any(marker in error_msg for marker in vllm_dimensions_error_markers):
                    logger.warning(
                        "[OpenAI Embedding] Detected possible vLLM 'dimensions' incompatibility, "
                        "retrying without dimensions parameter: %s",
                        e,
                    )
                    kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
                    embedding = await self.client.embeddings.create(
                        input=text,
                        model=self.model,
                        **kwargs_retry,
                    )
                    return embedding.data[0].embedding

            # 未命中 vLLM 兼容规则或未开启配置,直接抛出原始异常,避免隐藏真实错误
            raise

```

1. 在对应的 provider/source 类(定义 `self.client``self.model` 的类)上新增一个配置属性,例如:
   -`__init__` 中增加 `self.retry_on_dimensions_error = config.get("retry_on_dimensions_error", False)`,或从全局配置/环境变量读取。
2. 如果项目有统一的配置系统,建议在文档和默认配置中加入说明:
   - 该开关仅在使用 vLLM 且遇到 dimensions 不兼容时需要开启;
   - 默认保持为 `False`,以避免误将其它错误当作 vLLM 兼容问题而被掩盖。
</issue_to_address>

### Comment 4
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="79" />
<code_context>
+        self._is_vllm_detected = True
+        logger.info("[OpenAI Embedding] Marked as vLLM (runtime detection via error)")

     async def get_embedding(self, text: str) -> list[float]:
         """获取文本的嵌入"""
         kwargs = self._embedding_kwargs()
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting shared helpers for the retry-without-dimensions flow and dimension resolution so `get_embedding`/`get_embeddings` and related methods become simpler and less duplicated.

You can keep all current behavior while reducing duplication and making the control flow easier to follow by extracting two small helpers and centralizing dimension logic.

### 1. Extract shared retry-without-dimensions helper

`get_embedding` and `get_embeddings` have almost identical try/except logic. Move that into a single private async helper that takes `input` and a post-processing callback:

```python
async def _create_embeddings_with_dim_fallback(
    self,
    input_data: Any,
    postprocess: Callable[[Any], Any],
) -> Any:
    kwargs = self._embedding_kwargs()
    try:
        embeddings = await self.client.embeddings.create(
            input=input_data,
            model=self.model,
            **kwargs,
        )
        return postprocess(embeddings)
    except Exception as e:
        error_msg = str(e).lower()
        if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
            logger.warning(
                f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}"
            )
            kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
            try:
                embeddings = await self.client.embeddings.create(
                    input=input_data,
                    model=self.model,
                    **kwargs_retry,
                )
                logger.info(
                    "[OpenAI Embedding] Successfully retrieved embeddings without dimensions parameter, marking as vLLM"
                )
                self._mark_as_vllm()
                return postprocess(embeddings)
            except Exception as retry_error:
                logger.error(
                    f"[OpenAI Embedding] Retry without dimensions also failed: {retry_error}"
                )
                raise retry_error
        raise
```

Then `get_embedding` and `get_embeddings` become very simple:

```python
async def get_embedding(self, text: str) -> list[float]:
    return await self._create_embeddings_with_dim_fallback(
        text,
        postprocess=lambda res: res.data[0].embedding,
    )

async def get_embeddings(self, text: list[str]) -> list[list[float]]:
    return await self._create_embeddings_with_dim_fallback(
        text,
        postprocess=lambda res: [item.embedding for item in res.data],
    )
```

This keeps the exact retry behavior and logging, but removes duplicated branches.

### 2. Centralize dimension parsing / inference

`_embedding_kwargs` and `get_dim` both parse `embedding_dimensions` and log. You can centralize that into one helper that returns an `int | None` and is reused:

```python
def _resolve_dimensions(self) -> int | None:
    provider_id = self.provider_config.get("id", "unknown")
    embedding_dim_config = self.provider_config.get("embedding_dimensions", "")

    if embedding_dim_config and embedding_dim_config != "":
        try:
            dim = int(embedding_dim_config)
            if dim > 0:
                logger.info(
                    f"[OpenAI Embedding] {provider_id}: Dimension from config: {dim}"
                )
                return dim
        except (ValueError, TypeError):
            logger.warning(
                f"[OpenAI Embedding] {provider_id}: embedding_dimensions is not a valid integer: '{embedding_dim_config}'"
            )
            # fall through to model inference

    model = self.provider_config.get("embedding_model", "").lower()
    model_dims = {
        "bge-m3": 1024,
        "bge-large-en-v1.5": 1024,
        "bge-large-zh-v1.5": 1024,
        "text-embedding-3-small": 1536,
        "text-embedding-3-large": 3072,
        "text-embedding-ada-002": 1536,
    }
    for model_key, dim in model_dims.items():
        if model_key in model:
            logger.info(
                f"[OpenAI Embedding] {provider_id}: Inferred dimension {dim} from model: {model}"
            )
            return dim

    logger.warning(
        f"[OpenAI Embedding] {provider_id}: Could not determine dimension (model: {model}, config: '{embedding_dim_config}')"
    )
    return None
```

Then:

```python
def _embedding_kwargs(self) -> dict:
    kwargs: dict[str, Any] = {}
    provider_id = self.provider_config.get("id", "unknown")

    if self._is_vllm():
        logger.info(
            f"[OpenAI Embedding] {provider_id}: Detected vLLM, skipping dimensions parameter"
        )
        return kwargs

    dim = self._resolve_dimensions()
    if dim is not None:
        kwargs["dimensions"] = dim
    else:
        logger.info(
            f"[OpenAI Embedding] {provider_id}: No explicit/inferred embedding_dimensions, API will use default"
        )
    return kwargs

def get_dim(self) -> int:
    dim = self._resolve_dimensions()
    return dim or 0
```

This removes duplicated parsing, try/except, and logging, while preserving the current behavior (config first, then model inference, then fallback).

### 3. Slightly simplify vLLM detection state

You already have `_is_vllm_detected` plus heuristics; wrapping both into `_is_vllm()` is fine, but you can keep it narrowly focused and avoid repeating `.lower()`:

```python
def _is_vllm(self) -> bool:
    if self._is_vllm_detected:
        return True

    provider_id = (self.provider_config.get("id", "") or "").lower()
    api_base = (self.api_base_normalized or "").lower()

    if "vllm" in provider_id:
        logger.info(f"[OpenAI Embedding] Detected vLLM by provider id: {provider_id}")
        return True

    if "vllm" in api_base:
        logger.info("[OpenAI Embedding] Detected vLLM by api_base keyword")
        return True

    if any(port in api_base for port in (":8000", ":8001", ":8002")):
        logger.info(
            f"[OpenAI Embedding] Detected vLLM by common port in api_base: {api_base}"
        )
        return True

    return False
```

This keeps all your current heuristics and logging, but makes the method more compact and easier to scan.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements vLLM compatibility for the OpenAI embedding provider by detecting vLLM environments and omitting the unsupported 'dimensions' parameter. It adds runtime error detection, model-based dimension inference, and dashboard updates. A critical syntax error was found in the Vue component where raw text was added without comment markers, and the static port-based vLLM detection was noted as potentially too broad.

if (response.data.status != "error" && response.data.data?.embedding_dimensions) {
console.log(response.data.data.embedding_dimensions)
providerConfig.embedding_dimensions = response.data.data.embedding_dimensions
[已禁用] 不再自动写入配置文件,仅显示提示
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

这一行直接写入了中文文本而没有使用注释符号(//),这会导致 Javascript 语法错误,从而导致前端控制面板无法正常工作。请将其改为注释或删除。

      // [已禁用] 不再自动写入配置文件,仅显示提示

Comment on lines +69 to +71
if ":8000" in api_base or ":8001" in api_base or ":8002" in api_base:
logger.info(f"[OpenAI Embedding] Detected vLLM by common port in api_base: {api_base}")
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

通过端口号(8000, 8001, 8002)来静态判断是否为 vLLM 可能存在误判风险,因为这些是常用的通用端口。考虑到代码中已经实现了基于异常捕获的运行时自动检测逻辑(get_embedding 中的 retry),这里的静态检测可以考虑更加保守一些,或者完全依赖运行时检测和关键字匹配。

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Apr 13, 2026
@@ -51,24 +51,25 @@ def __init__(self, provider_config: dict, provider_settings: dict) -> None:
self._is_vllm_detected = False

def _is_vllm(self) -> bool:
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改了下_is_vllm的逻辑,现在在OpenAI Embedding提供商的API key这一栏填入“vllm“才会使得dimensions参数不会发给vllm防止报错。优化了之前靠检测端口号的静态判定逻辑

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant