Skip to content

fix: remove python-ripgrep dependency, use system rg/grep instead (fixes #7496)#7505

Open
Soulter wants to merge 1 commit intomasterfrom
fix/remove-python-ripgrep-3.13-support
Open

fix: remove python-ripgrep dependency, use system rg/grep instead (fixes #7496)#7505
Soulter wants to merge 1 commit intomasterfrom
fix/remove-python-ripgrep-3.13-support

Conversation

@Soulter
Copy link
Copy Markdown
Member

@Soulter Soulter commented Apr 13, 2026

Summary

does not support Python 3.13 (requires ), causing AstrBot to fail to start on Python 3.13 environments (see #7496).

This PR removes the dependency and replaces it with direct subprocess calls to system (rg) with automatic fallback to standard .

Changes

  • local.py: Remove import, rewrite method

    • Uses to detect or availability
    • Priority: > > error
    • Supports all original parameters: , , , ,
    • Properly handles exit codes (0=success, 1=no matches for grep)
    • 30-second timeout protection
  • requirements.txt: Remove

  • pyproject.toml: Remove

Compatibility

Python Version Before After
3.10-3.12
3.13 ❌ (ModuleNotFoundError)

Testing

Tested the search functionality logic with:

  • available (uses ripgrep)
  • not available, available (falls back to grep)
  • Neither available (returns proper error message)

Related

Fixes #7496


This PR is generated by AstrBot

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new grep/rg command construction assumes GNU-style options (e.g., --include, --color=never), which may not be available or behave identically on all platforms (especially macOS/BSD), so it’s worth double‑checking and guarding for cross-platform compatibility of these flags.
  • The grep branch has a redundant shutil.which("grep") check inside the if glob block and uses --include without verifying support, which could be simplified and made safer by detecting grep flavor/capabilities once and adjusting options accordingly.
  • Consider aligning the behavior of the glob/path handling more closely with the previous python-ripgrep semantics (e.g., how directory globs and recursion are treated) to avoid subtle changes in which files get searched.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new grep/rg command construction assumes GNU-style options (e.g., `--include`, `--color=never`), which may not be available or behave identically on all platforms (especially macOS/BSD), so it’s worth double‑checking and guarding for cross-platform compatibility of these flags.
- The grep branch has a redundant `shutil.which("grep")` check inside the `if glob` block and uses `--include` without verifying support, which could be simplified and made safer by detecting grep flavor/capabilities once and adjusting options accordingly.
- Consider aligning the behavior of the glob/path handling more closely with the previous `python-ripgrep` semantics (e.g., how directory globs and recursion are treated) to avoid subtle changes in which files get searched.

## Individual Comments

### Comment 1
<location path="astrbot/core/computer/booters/local.py" line_range="229-231" />
<code_context>
+                cmd = ["rg", "--line-number", "--color=never"]
+                if glob:
+                    cmd.extend(["--glob", glob])
+                if after_context:
+                    cmd.extend(["--after-context", str(after_context)])
+                if before_context:
+                    cmd.extend(["--before-context", str(before_context)])
+                cmd.extend([pattern, search_path])
</code_context>
<issue_to_address>
**issue:** Use explicit `is not None` checks for `after_context`/`before_context` to avoid treating `0` as "unset".

These truthiness checks will drop `0`, so an explicit `0` (e.g., "no context") is ignored. Using `if after_context is not None:` / `if before_context is not None:` preserves `0` while still treating `None` as "unset."
</issue_to_address>

### Comment 2
<location path="astrbot/core/computer/booters/local.py" line_range="224" />
<code_context>
-            return {"success": True, "content": _truncate_long_lines("".join(results))}
+            search_path = path if path else "."
+            
+            # Try ripgrep first, fallback to grep
+            if shutil.which("rg"):
+                cmd = ["rg", "--line-number", "--color=never"]
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting command-building and execution logic into small helper functions so the `search_files` inner `_run` function remains a thin, linear orchestration layer.

You can keep the new behavior while making `search_files` closer to the old “thin orchestration” style by extracting the command construction and execution into small helpers.

For example, inside the same class/module:

```python
def _build_rg_cmd(
    pattern: str,
    search_path: str,
    glob: str | None,
    after_context: int | None,
    before_context: int | None,
) -> list[str]:
    cmd = ["rg", "--line-number", "--color=never"]
    if glob:
        cmd.extend(["--glob", glob])
    if after_context:
        cmd.extend(["--after-context", str(after_context)])
    if before_context:
        cmd.extend(["--before-context", str(before_context)])
    cmd.extend([pattern, search_path])
    return cmd


def _build_grep_cmd(
    pattern: str,
    search_path: str,
    glob: str | None,
    after_context: int | None,
    before_context: int | None,
) -> list[str]:
    cmd = ["grep", "-rn", "--color=never"]
    if after_context:
        cmd.extend(["-A", str(after_context)])
    if before_context:
        cmd.extend(["-B", str(before_context)])
    if glob:
        cmd.extend(["--include", glob])  # keep GNU grep assumption
    cmd.extend([pattern, search_path])
    return cmd


def _run_search_cmd(cmd: list[str]) -> dict[str, Any]:
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=30,
        )
    except subprocess.TimeoutExpired:
        return {"success": False, "error": "Search command timed out after 30 seconds"}
    except Exception as e:
        return {"success": False, "error": f"Search failed: {e}"}

    if result.returncode not in (0, 1):
        return {
            "success": False,
            "error": f"Search command failed with exit code {result.returncode}: {result.stderr}",
        }

    output = result.stdout or ""
    return {"success": True, "content": _truncate_long_lines(output)}
```

Then `search_files`’s inner function becomes linear and easier to scan:

```python
async def search_files(... ) -> dict[str, Any]:
    def _run() -> dict[str, Any]:
        search_path = path or "."
        cmd: list[str] | None = None

        if shutil.which("rg"):
            cmd = _build_rg_cmd(pattern, search_path, glob, after_context, before_context)
        elif shutil.which("grep"):
            cmd = _build_grep_cmd(pattern, search_path, glob, after_context, before_context)
        else:
            return {
                "success": False,
                "error": "Neither ripgrep (rg) nor grep is available on the system",
            }

        return _run_search_cmd(cmd)

    return await asyncio.to_thread(_run)
```

This keeps all current behavior (rg first, grep fallback, context flags, glob handling, timeout, and exit-code semantics) but:

- Removes duplicated `extend`-heavy logic from `_run`.
- Separates feature selection (rg vs grep) from subprocess/error-handling details.
- Makes it straightforward to add another backend later by adding another `_build_*_cmd` and a branch.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +229 to +231
if after_context:
cmd.extend(["--after-context", str(after_context)])
if before_context:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Use explicit is not None checks for after_context/before_context to avoid treating 0 as "unset".

These truthiness checks will drop 0, so an explicit 0 (e.g., "no context") is ignored. Using if after_context is not None: / if before_context is not None: preserves 0 while still treating None as "unset."

return {"success": True, "content": _truncate_long_lines("".join(results))}
search_path = path if path else "."

# Try ripgrep first, fallback to grep
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting command-building and execution logic into small helper functions so the search_files inner _run function remains a thin, linear orchestration layer.

You can keep the new behavior while making search_files closer to the old “thin orchestration” style by extracting the command construction and execution into small helpers.

For example, inside the same class/module:

def _build_rg_cmd(
    pattern: str,
    search_path: str,
    glob: str | None,
    after_context: int | None,
    before_context: int | None,
) -> list[str]:
    cmd = ["rg", "--line-number", "--color=never"]
    if glob:
        cmd.extend(["--glob", glob])
    if after_context:
        cmd.extend(["--after-context", str(after_context)])
    if before_context:
        cmd.extend(["--before-context", str(before_context)])
    cmd.extend([pattern, search_path])
    return cmd


def _build_grep_cmd(
    pattern: str,
    search_path: str,
    glob: str | None,
    after_context: int | None,
    before_context: int | None,
) -> list[str]:
    cmd = ["grep", "-rn", "--color=never"]
    if after_context:
        cmd.extend(["-A", str(after_context)])
    if before_context:
        cmd.extend(["-B", str(before_context)])
    if glob:
        cmd.extend(["--include", glob])  # keep GNU grep assumption
    cmd.extend([pattern, search_path])
    return cmd


def _run_search_cmd(cmd: list[str]) -> dict[str, Any]:
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=30,
        )
    except subprocess.TimeoutExpired:
        return {"success": False, "error": "Search command timed out after 30 seconds"}
    except Exception as e:
        return {"success": False, "error": f"Search failed: {e}"}

    if result.returncode not in (0, 1):
        return {
            "success": False,
            "error": f"Search command failed with exit code {result.returncode}: {result.stderr}",
        }

    output = result.stdout or ""
    return {"success": True, "content": _truncate_long_lines(output)}

Then search_files’s inner function becomes linear and easier to scan:

async def search_files(... ) -> dict[str, Any]:
    def _run() -> dict[str, Any]:
        search_path = path or "."
        cmd: list[str] | None = None

        if shutil.which("rg"):
            cmd = _build_rg_cmd(pattern, search_path, glob, after_context, before_context)
        elif shutil.which("grep"):
            cmd = _build_grep_cmd(pattern, search_path, glob, after_context, before_context)
        else:
            return {
                "success": False,
                "error": "Neither ripgrep (rg) nor grep is available on the system",
            }

        return _run_search_cmd(cmd)

    return await asyncio.to_thread(_run)

This keeps all current behavior (rg first, grep fallback, context flags, glob handling, timeout, and exit-code semantics) but:

  • Removes duplicated extend-heavy logic from _run.
  • Separates feature selection (rg vs grep) from subprocess/error-handling details.
  • Makes it straightforward to add another backend later by adding another _build_*_cmd and a branch.

@dosubot dosubot bot added the area:core The bug / feature is about astrbot's core, backend label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the python-ripgrep library with direct system calls to rg or grep, reducing external dependencies. The feedback focuses on improving the robustness of the implementation by using the -e flag to safely handle patterns starting with hyphens, removing a redundant check for the grep binary, and utilizing a custom decoder for subprocess output to prevent potential UnicodeDecodeErrors.

cmd.extend(["--after-context", str(after_context)])
if before_context:
cmd.extend(["--before-context", str(before_context)])
cmd.extend([pattern, search_path])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is safer to use the -e flag for the search pattern. This prevents the pattern from being interpreted as a command-line option if it starts with a hyphen (e.g., searching for a string like "-foo").

Suggested change
cmd.extend([pattern, search_path])
cmd.extend(["-e", pattern, search_path])

if before_context:
cmd.extend(["-B", str(before_context)])
# grep doesn't support glob directly, use include if available
if glob and shutil.which("grep"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check shutil.which("grep") is redundant here because this code is already inside an elif shutil.which("grep"): block (line 234).

Suggested change
if glob and shutil.which("grep"):
if glob:

if glob and shutil.which("grep"):
# Try to use --include if grep supports it (GNU grep)
cmd.extend(["--include", glob])
cmd.extend([pattern, search_path])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is safer to use the -e flag for the search pattern. This prevents the pattern from being interpreted as a command-line option if it starts with a hyphen.

Suggested change
cmd.extend([pattern, search_path])
cmd.extend(["-e", pattern, search_path])

Comment on lines +252 to +265
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=30,
)
# grep returns exit code 1 when no matches found, which is not an error
if result.returncode not in (0, 1):
return {
"success": False,
"error": f"Search command failed with exit code {result.returncode}: {result.stderr}",
}
output = result.stdout if result.stdout else ""
return {"success": True, "content": _truncate_long_lines(output)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using text=True in subprocess.run relies on the system's default encoding, which can lead to UnicodeDecodeError if the search results contain snippets from files with different encodings. It is more robust to capture the output as bytes and use the existing _decode_shell_output utility, which handles various encodings and provides fallbacks.

Suggested change
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=30,
)
# grep returns exit code 1 when no matches found, which is not an error
if result.returncode not in (0, 1):
return {
"success": False,
"error": f"Search command failed with exit code {result.returncode}: {result.stderr}",
}
output = result.stdout if result.stdout else ""
return {"success": True, "content": _truncate_long_lines(output)}
result = subprocess.run(
cmd,
capture_output=True,
timeout=30,
)
# grep returns exit code 1 when no matches found, which is not an error
if result.returncode not in (0, 1):
return {
"success": False,
"error": f"Search command failed with exit code {result.returncode}: {_decode_shell_output(result.stderr)}",
}
output = _decode_shell_output(result.stdout)
return {"success": True, "content": _truncate_long_lines(output)}

python-ripgrep 0.0.9 does not support Python 3.13 (requires <3.13,>=3.10).
This change replaces the python-ripgrep library with direct subprocess calls
to system ripgrep (rg) with fallback to grep.

Changes:
- Remove python-ripgrep import from local.py
- Rewrite search_files() to use shutil.which() to detect rg/grep availability
- Support ripgrep first, fallback to grep if not available
- Handle proper exit codes (0=success, 1=no matches for grep)
- Remove python-ripgrep from requirements.txt and pyproject.toml

Fixes #7496
@Soulter Soulter force-pushed the fix/remove-python-ripgrep-3.13-support branch from 16a73cc to 4d00309 Compare April 13, 2026 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]python3.13环境更新v4.23.0无法启动

1 participant