fix: remove python-ripgrep dependency, use system rg/grep instead (fixes #7496)#7505
fix: remove python-ripgrep dependency, use system rg/grep instead (fixes #7496)#7505
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The new grep/rg command construction assumes GNU-style options (e.g.,
--include,--color=never), which may not be available or behave identically on all platforms (especially macOS/BSD), so it’s worth double‑checking and guarding for cross-platform compatibility of these flags. - The grep branch has a redundant
shutil.which("grep")check inside theif globblock and uses--includewithout verifying support, which could be simplified and made safer by detecting grep flavor/capabilities once and adjusting options accordingly. - Consider aligning the behavior of the glob/path handling more closely with the previous
python-ripgrepsemantics (e.g., how directory globs and recursion are treated) to avoid subtle changes in which files get searched.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new grep/rg command construction assumes GNU-style options (e.g., `--include`, `--color=never`), which may not be available or behave identically on all platforms (especially macOS/BSD), so it’s worth double‑checking and guarding for cross-platform compatibility of these flags.
- The grep branch has a redundant `shutil.which("grep")` check inside the `if glob` block and uses `--include` without verifying support, which could be simplified and made safer by detecting grep flavor/capabilities once and adjusting options accordingly.
- Consider aligning the behavior of the glob/path handling more closely with the previous `python-ripgrep` semantics (e.g., how directory globs and recursion are treated) to avoid subtle changes in which files get searched.
## Individual Comments
### Comment 1
<location path="astrbot/core/computer/booters/local.py" line_range="229-231" />
<code_context>
+ cmd = ["rg", "--line-number", "--color=never"]
+ if glob:
+ cmd.extend(["--glob", glob])
+ if after_context:
+ cmd.extend(["--after-context", str(after_context)])
+ if before_context:
+ cmd.extend(["--before-context", str(before_context)])
+ cmd.extend([pattern, search_path])
</code_context>
<issue_to_address>
**issue:** Use explicit `is not None` checks for `after_context`/`before_context` to avoid treating `0` as "unset".
These truthiness checks will drop `0`, so an explicit `0` (e.g., "no context") is ignored. Using `if after_context is not None:` / `if before_context is not None:` preserves `0` while still treating `None` as "unset."
</issue_to_address>
### Comment 2
<location path="astrbot/core/computer/booters/local.py" line_range="224" />
<code_context>
- return {"success": True, "content": _truncate_long_lines("".join(results))}
+ search_path = path if path else "."
+
+ # Try ripgrep first, fallback to grep
+ if shutil.which("rg"):
+ cmd = ["rg", "--line-number", "--color=never"]
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting command-building and execution logic into small helper functions so the `search_files` inner `_run` function remains a thin, linear orchestration layer.
You can keep the new behavior while making `search_files` closer to the old “thin orchestration” style by extracting the command construction and execution into small helpers.
For example, inside the same class/module:
```python
def _build_rg_cmd(
pattern: str,
search_path: str,
glob: str | None,
after_context: int | None,
before_context: int | None,
) -> list[str]:
cmd = ["rg", "--line-number", "--color=never"]
if glob:
cmd.extend(["--glob", glob])
if after_context:
cmd.extend(["--after-context", str(after_context)])
if before_context:
cmd.extend(["--before-context", str(before_context)])
cmd.extend([pattern, search_path])
return cmd
def _build_grep_cmd(
pattern: str,
search_path: str,
glob: str | None,
after_context: int | None,
before_context: int | None,
) -> list[str]:
cmd = ["grep", "-rn", "--color=never"]
if after_context:
cmd.extend(["-A", str(after_context)])
if before_context:
cmd.extend(["-B", str(before_context)])
if glob:
cmd.extend(["--include", glob]) # keep GNU grep assumption
cmd.extend([pattern, search_path])
return cmd
def _run_search_cmd(cmd: list[str]) -> dict[str, Any]:
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=30,
)
except subprocess.TimeoutExpired:
return {"success": False, "error": "Search command timed out after 30 seconds"}
except Exception as e:
return {"success": False, "error": f"Search failed: {e}"}
if result.returncode not in (0, 1):
return {
"success": False,
"error": f"Search command failed with exit code {result.returncode}: {result.stderr}",
}
output = result.stdout or ""
return {"success": True, "content": _truncate_long_lines(output)}
```
Then `search_files`’s inner function becomes linear and easier to scan:
```python
async def search_files(... ) -> dict[str, Any]:
def _run() -> dict[str, Any]:
search_path = path or "."
cmd: list[str] | None = None
if shutil.which("rg"):
cmd = _build_rg_cmd(pattern, search_path, glob, after_context, before_context)
elif shutil.which("grep"):
cmd = _build_grep_cmd(pattern, search_path, glob, after_context, before_context)
else:
return {
"success": False,
"error": "Neither ripgrep (rg) nor grep is available on the system",
}
return _run_search_cmd(cmd)
return await asyncio.to_thread(_run)
```
This keeps all current behavior (rg first, grep fallback, context flags, glob handling, timeout, and exit-code semantics) but:
- Removes duplicated `extend`-heavy logic from `_run`.
- Separates feature selection (rg vs grep) from subprocess/error-handling details.
- Makes it straightforward to add another backend later by adding another `_build_*_cmd` and a branch.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if after_context: | ||
| cmd.extend(["--after-context", str(after_context)]) | ||
| if before_context: |
There was a problem hiding this comment.
issue: Use explicit is not None checks for after_context/before_context to avoid treating 0 as "unset".
These truthiness checks will drop 0, so an explicit 0 (e.g., "no context") is ignored. Using if after_context is not None: / if before_context is not None: preserves 0 while still treating None as "unset."
| return {"success": True, "content": _truncate_long_lines("".join(results))} | ||
| search_path = path if path else "." | ||
|
|
||
| # Try ripgrep first, fallback to grep |
There was a problem hiding this comment.
issue (complexity): Consider extracting command-building and execution logic into small helper functions so the search_files inner _run function remains a thin, linear orchestration layer.
You can keep the new behavior while making search_files closer to the old “thin orchestration” style by extracting the command construction and execution into small helpers.
For example, inside the same class/module:
def _build_rg_cmd(
pattern: str,
search_path: str,
glob: str | None,
after_context: int | None,
before_context: int | None,
) -> list[str]:
cmd = ["rg", "--line-number", "--color=never"]
if glob:
cmd.extend(["--glob", glob])
if after_context:
cmd.extend(["--after-context", str(after_context)])
if before_context:
cmd.extend(["--before-context", str(before_context)])
cmd.extend([pattern, search_path])
return cmd
def _build_grep_cmd(
pattern: str,
search_path: str,
glob: str | None,
after_context: int | None,
before_context: int | None,
) -> list[str]:
cmd = ["grep", "-rn", "--color=never"]
if after_context:
cmd.extend(["-A", str(after_context)])
if before_context:
cmd.extend(["-B", str(before_context)])
if glob:
cmd.extend(["--include", glob]) # keep GNU grep assumption
cmd.extend([pattern, search_path])
return cmd
def _run_search_cmd(cmd: list[str]) -> dict[str, Any]:
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=30,
)
except subprocess.TimeoutExpired:
return {"success": False, "error": "Search command timed out after 30 seconds"}
except Exception as e:
return {"success": False, "error": f"Search failed: {e}"}
if result.returncode not in (0, 1):
return {
"success": False,
"error": f"Search command failed with exit code {result.returncode}: {result.stderr}",
}
output = result.stdout or ""
return {"success": True, "content": _truncate_long_lines(output)}Then search_files’s inner function becomes linear and easier to scan:
async def search_files(... ) -> dict[str, Any]:
def _run() -> dict[str, Any]:
search_path = path or "."
cmd: list[str] | None = None
if shutil.which("rg"):
cmd = _build_rg_cmd(pattern, search_path, glob, after_context, before_context)
elif shutil.which("grep"):
cmd = _build_grep_cmd(pattern, search_path, glob, after_context, before_context)
else:
return {
"success": False,
"error": "Neither ripgrep (rg) nor grep is available on the system",
}
return _run_search_cmd(cmd)
return await asyncio.to_thread(_run)This keeps all current behavior (rg first, grep fallback, context flags, glob handling, timeout, and exit-code semantics) but:
- Removes duplicated
extend-heavy logic from_run. - Separates feature selection (rg vs grep) from subprocess/error-handling details.
- Makes it straightforward to add another backend later by adding another
_build_*_cmdand a branch.
There was a problem hiding this comment.
Code Review
This pull request replaces the python-ripgrep library with direct system calls to rg or grep, reducing external dependencies. The feedback focuses on improving the robustness of the implementation by using the -e flag to safely handle patterns starting with hyphens, removing a redundant check for the grep binary, and utilizing a custom decoder for subprocess output to prevent potential UnicodeDecodeErrors.
| cmd.extend(["--after-context", str(after_context)]) | ||
| if before_context: | ||
| cmd.extend(["--before-context", str(before_context)]) | ||
| cmd.extend([pattern, search_path]) |
There was a problem hiding this comment.
| if before_context: | ||
| cmd.extend(["-B", str(before_context)]) | ||
| # grep doesn't support glob directly, use include if available | ||
| if glob and shutil.which("grep"): |
| if glob and shutil.which("grep"): | ||
| # Try to use --include if grep supports it (GNU grep) | ||
| cmd.extend(["--include", glob]) | ||
| cmd.extend([pattern, search_path]) |
| result = subprocess.run( | ||
| cmd, | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=30, | ||
| ) | ||
| # grep returns exit code 1 when no matches found, which is not an error | ||
| if result.returncode not in (0, 1): | ||
| return { | ||
| "success": False, | ||
| "error": f"Search command failed with exit code {result.returncode}: {result.stderr}", | ||
| } | ||
| output = result.stdout if result.stdout else "" | ||
| return {"success": True, "content": _truncate_long_lines(output)} |
There was a problem hiding this comment.
Using text=True in subprocess.run relies on the system's default encoding, which can lead to UnicodeDecodeError if the search results contain snippets from files with different encodings. It is more robust to capture the output as bytes and use the existing _decode_shell_output utility, which handles various encodings and provides fallbacks.
| result = subprocess.run( | |
| cmd, | |
| capture_output=True, | |
| text=True, | |
| timeout=30, | |
| ) | |
| # grep returns exit code 1 when no matches found, which is not an error | |
| if result.returncode not in (0, 1): | |
| return { | |
| "success": False, | |
| "error": f"Search command failed with exit code {result.returncode}: {result.stderr}", | |
| } | |
| output = result.stdout if result.stdout else "" | |
| return {"success": True, "content": _truncate_long_lines(output)} | |
| result = subprocess.run( | |
| cmd, | |
| capture_output=True, | |
| timeout=30, | |
| ) | |
| # grep returns exit code 1 when no matches found, which is not an error | |
| if result.returncode not in (0, 1): | |
| return { | |
| "success": False, | |
| "error": f"Search command failed with exit code {result.returncode}: {_decode_shell_output(result.stderr)}", | |
| } | |
| output = _decode_shell_output(result.stdout) | |
| return {"success": True, "content": _truncate_long_lines(output)} |
python-ripgrep 0.0.9 does not support Python 3.13 (requires <3.13,>=3.10). This change replaces the python-ripgrep library with direct subprocess calls to system ripgrep (rg) with fallback to grep. Changes: - Remove python-ripgrep import from local.py - Rewrite search_files() to use shutil.which() to detect rg/grep availability - Support ripgrep first, fallback to grep if not available - Handle proper exit codes (0=success, 1=no matches for grep) - Remove python-ripgrep from requirements.txt and pyproject.toml Fixes #7496
16a73cc to
4d00309
Compare
Summary
does not support Python 3.13 (requires ), causing AstrBot to fail to start on Python 3.13 environments (see #7496).
This PR removes the dependency and replaces it with direct subprocess calls to system (rg) with automatic fallback to standard .
Changes
local.py: Remove import, rewrite method
requirements.txt: Remove
pyproject.toml: Remove
Compatibility
Testing
Tested the search functionality logic with:
Related
Fixes #7496
This PR is generated by AstrBot