Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .github/workflows/paimon-python-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ jobs:
build-essential \
git \
curl \
pkg-config \
libssl-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

Expand Down Expand Up @@ -139,12 +141,22 @@ jobs:
if: matrix.python-version != '3.6.15'
shell: bash
run: |
pip install maturin
pip install maturin[patchelf]
git clone -b support_directory https://github.com/JingsongLi/tantivy-py.git /tmp/tantivy-py
cd /tmp/tantivy-py
maturin build --release
pip install target/wheels/tantivy-*.whl

- name: Build and install pypaimon-rust from source
if: matrix.python-version != '3.6.15'
shell: bash
run: |
git clone https://github.com/apache/paimon-rust.git /tmp/paimon-rust
cd /tmp/paimon-rust/bindings/python
maturin build --release -o dist
pip install dist/pypaimon_rust-*.whl
pip install 'datafusion>=52'

- name: Run lint-python.sh
shell: bash
run: |
Expand Down
102 changes: 102 additions & 0 deletions docs/content/pypaimon/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -621,3 +621,105 @@ default
mydb
analytics
```

## SQL Command

Execute SQL queries on Paimon tables directly from the command line. This feature is powered by pypaimon-rust and DataFusion.

**Prerequisites:**

```shell
pip install pypaimon[sql]
```

### One-Shot Query

Execute a single SQL query and display the result:

```shell
paimon sql "SELECT * FROM users LIMIT 10"
```

Output:
```
id name age city
1 Alice 25 Beijing
2 Bob 30 Shanghai
3 Charlie 35 Guangzhou
```

**Options:**

- `--format, -f`: Output format: `table` (default) or `json`

**Examples:**

```shell
# Direct table name (uses default catalog and database)
paimon sql "SELECT * FROM users"

# Two-part: database.table
paimon sql "SELECT * FROM mydb.users"

# Query with filter and aggregation
paimon sql "SELECT city, COUNT(*) AS cnt FROM users GROUP BY city ORDER BY cnt DESC"

# Output as JSON
paimon sql "SELECT * FROM users LIMIT 5" --format json
```

### Interactive REPL

Start an interactive SQL session by running `paimon sql` without a query argument. The REPL supports arrow keys for line editing, and command history is persisted across sessions in `~/.paimon_history`.

```shell
paimon sql
```

Output:
```
____ _
/ __ \____ _(_)___ ___ ____ ____
/ /_/ / __ `/ / __ `__ \/ __ \/ __ \
/ ____/ /_/ / / / / / / / /_/ / / / /
/_/ \__,_/_/_/ /_/ /_/\____/_/ /_/

Powered by pypaimon-rust + DataFusion
Type 'help' for usage, 'exit' to quit.

paimon> SHOW DATABASES;
default
mydb

paimon> USE mydb;
Using database 'mydb'.

paimon> SHOW TABLES;
orders
users

paimon> SELECT count(*) AS cnt
> FROM users
> WHERE age > 18;
cnt
42
(1 row in 0.05s)

paimon> exit
Bye!
```

SQL statements end with `;` and can span multiple lines. The continuation prompt ` >` indicates that more input is expected.

**REPL Commands:**

| Command | Description |
|---|---|
| `USE <database>;` | Switch the default database |
| `SHOW DATABASES;` | List all databases |
| `SHOW TABLES;` | List tables in the current database |
| `SELECT ...;` | Execute a SQL query |
| `help` | Show usage information |
| `exit` / `quit` | Exit the REPL |

For more details on SQL syntax and the Python API, see [SQL Query]({{< ref "pypaimon/sql" >}}).
168 changes: 168 additions & 0 deletions docs/content/pypaimon/sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
---
title: "SQL Query"
weight: 8
type: docs
aliases:
- /pypaimon/sql.html
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# SQL Query

PyPaimon supports executing SQL queries on Paimon tables, powered by [pypaimon-rust](https://github.com/apache/paimon-rust/tree/main/bindings/python) and [DataFusion](https://datafusion.apache.org/python/).

## Installation

SQL query support requires additional dependencies. Install them with:

```shell
pip install pypaimon[sql]
```

This will install `pypaimon-rust` and `datafusion`.

## Usage

Create a `SQLContext`, register one or more catalogs with their options, and run SQL queries.

### Basic Query

```python
from pypaimon.sql import SQLContext

ctx = SQLContext()
ctx.register_catalog("paimon", {"warehouse": "/path/to/warehouse"})
ctx.set_current_catalog("paimon")
ctx.set_current_database("default")

# Execute SQL and get PyArrow Table
table = ctx.sql("SELECT * FROM my_table")
print(table)

# Convert to Pandas DataFrame
df = table.to_pandas()
print(df)
```

### Table Reference Format

The default catalog and default database can be configured via `set_current_catalog()` and `set_current_database()`, so you can reference tables in two ways:

```python
# Direct table name (uses default database)
ctx.sql("SELECT * FROM my_table")

# Two-part: database.table
ctx.sql("SELECT * FROM mydb.my_table")
```

### Filtering

```python
table = ctx.sql("""
SELECT id, name, age
FROM users
WHERE age > 18 AND city = 'Beijing'
""")
```

### Aggregation

```python
table = ctx.sql("""
SELECT city, COUNT(*) AS cnt, AVG(age) AS avg_age
FROM users
GROUP BY city
ORDER BY cnt DESC
""")
```

### Join

```python
table = ctx.sql("""
SELECT u.name, o.order_id, o.amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
""")
```

### Subquery

```python
table = ctx.sql("""
SELECT * FROM users
WHERE id IN (
SELECT user_id FROM orders
WHERE amount > 1000
)
""")
```

### Cross-Database Query

```python
# Query a table in another database using two-part syntax
table = ctx.sql("""
SELECT u.name, o.amount
FROM default.users u
JOIN analytics.orders o ON u.id = o.user_id
""")
```

### Multi-Catalog Query

`SQLContext` supports registering multiple catalogs for cross-catalog queries:

```python
from pypaimon.sql import SQLContext

ctx = SQLContext()
ctx.register_catalog("a", {"warehouse": "/path/to/warehouse_a"})
ctx.register_catalog("b", {
"metastore": "rest",
"uri": "http://localhost:8080",
"warehouse": "warehouse_b",
})
ctx.set_current_catalog("a")
ctx.set_current_database("default")

# Cross-catalog join
table = ctx.sql("""
SELECT a_users.name, b_orders.amount
FROM a.default.users AS a_users
JOIN b.default.orders AS b_orders ON a_users.id = b_orders.user_id
""")
```

## Supported SQL Syntax

The SQL engine is powered by Apache DataFusion, which supports a rich set of SQL syntax including:

- `SELECT`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY`, `LIMIT`
- `JOIN` (INNER, LEFT, RIGHT, FULL, CROSS)
- Subqueries and CTEs (`WITH`)
- Aggregate functions (`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, etc.)
- Window functions (`ROW_NUMBER`, `RANK`, `LAG`, `LEAD`, etc.)
- `UNION`, `INTERSECT`, `EXCEPT`

For the full SQL reference, see the [DataFusion SQL documentation](https://datafusion.apache.org/user-guide/sql/index.html).
2 changes: 2 additions & 0 deletions paimon-python/pypaimon/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@
from pypaimon.schema.schema import Schema
from pypaimon.tag.tag import Tag
from pypaimon.tag.tag_manager import TagManager
from pypaimon.sql.sql_context import SQLContext

__all__ = [
"PaimonVirtualFileSystem",
"CatalogFactory",
"Schema",
"Tag",
"TagManager",
"SQLContext",
]
4 changes: 4 additions & 0 deletions paimon-python/pypaimon/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,10 @@ def main():
from pypaimon.cli.cli_catalog import add_catalog_subcommands
add_catalog_subcommands(catalog_parser)

# SQL command
from pypaimon.cli.cli_sql import add_sql_subcommand
add_sql_subcommand(subparsers)

args = parser.parse_args()

if args.command is None:
Expand Down
Loading
Loading