fix: harden NL2SQLTool — read-only default, query validation, parameterized queries (#5311)
Some checks failed
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Vulnerability Scan / pip-audit (push) Has been cancelled

* fix: harden NL2SQLTool — read-only by default, parameterized queries, query validation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CI lint failures and remove unused import

- Remove unused `sessionmaker` import from test_nl2sql_security.py
- Use `Self` return type on `_apply_env_override` (fixes UP037/F821)
- Fix ruff errors auto-fixed in lib/crewai (UP007, etc.)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: expand _WRITE_COMMANDS and block multi-statement semicolon injection

- Add missing write commands: UPSERT, LOAD, COPY, VACUUM, ANALYZE,
  ANALYSE, REINDEX, CLUSTER, REFRESH, COMMENT, SET, RESET
- _validate_query() now splits on ';' and validates each statement
  independently; multi-statement queries are rejected outright in
  read-only mode to prevent 'SELECT 1; DROP TABLE users' bypass
- Extract single-statement logic into _validate_statement() helper
- Add TestSemicolonInjection and TestExtendedWriteCommands test classes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: retrigger

* fix: use typing_extensions.Self for Python 3.10 compat

* chore: update tool specifications

* docs: document NL2SQLTool read-only default and DML configuration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: close three NL2SQLTool security gaps (writable CTEs, EXPLAIN ANALYZE, multi-stmt commit)

- Remove WITH from _READ_ONLY_COMMANDS; scan CTE body for write keywords so
  writable CTEs like `WITH d AS (DELETE …) SELECT …` are blocked in read-only mode.
- EXPLAIN ANALYZE/ANALYSE now resolves the underlying command; EXPLAIN ANALYZE DELETE
  is treated as a write and blocked in read-only mode.
- execute_sql commit decision now checks ALL semicolon-separated statements so
  a SELECT-first batch like `SELECT 1; DROP TABLE t` still triggers a commit
  when allow_dml=True.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: handle parenthesized EXPLAIN options syntax; remove unused _seed_db

_validate_statement now strips parenthesized options from EXPLAIN (e.g.
EXPLAIN (ANALYZE) DELETE, EXPLAIN (ANALYZE, VERBOSE) DELETE) before
checking whether ANALYZE/ANALYSE is present — closing the bypass where
the options-list form was silently allowed in read-only mode.

Adds three new tests:
  - EXPLAIN (ANALYZE) DELETE  → blocked
  - EXPLAIN (ANALYZE, VERBOSE) DELETE  → blocked
  - EXPLAIN (VERBOSE) SELECT  → allowed

Also removes the unused _seed_db helper from test_nl2sql_security.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: update tool specifications

* fix: smarter CTE write detection, fix commit logic for writable CTEs

- Replace naive token-set matching with positional AS() body inspection
  to avoid false positives on column names like 'comment', 'set', 'reset'
- Fix execute_sql commit logic to detect writable CTEs (WITH + DELETE/INSERT)
  not just top-level write commands
- Add tests for false positive cases and writable CTE commit behavior
- Format nl2sql_tool.py to pass ruff format check

* fix: catch write commands in CTE main query + handle whitespace in AS()

- WITH cte AS (SELECT 1) DELETE FROM users now correctly blocked
- AS followed by newline/tab/multi-space before ( now detected
- execute_sql commit logic updated for both cases
- 4 new tests

* fix: EXPLAIN ANALYZE VERBOSE handling, string literal paren bypass, commit logic for EXPLAIN ANALYZE

- EXPLAIN handler now consumes all known options (ANALYZE, ANALYSE, VERBOSE) before
  extracting the real command, fixing 'EXPLAIN ANALYZE VERBOSE SELECT' being blocked
- Paren walker in _extract_main_query_after_cte now skips string literals, preventing
  'WITH cte AS (SELECT '\''('\'' FROM t) DELETE FROM users' from bypassing detection
- _is_write_stmt in execute_sql now resolves EXPLAIN ANALYZE to underlying command
  via _resolve_explain_command, ensuring session.commit() fires for write operations
- 10 new tests covering all three fixes

* fix: deduplicate EXPLAIN parsing, fix AS( regex in strings, block unknown CTE commands, bump langchain-core

- Refactor _validate_statement to use _resolve_explain_command (single source of truth)
- _iter_as_paren_matches skips string literals so 'AS (' in data doesn't confuse CTE detection
- Unknown commands after CTE definitions now blocked in read-only mode
- Bump langchain-core override to >=1.2.28 (GHSA-926x-3r5x-gfhw)

* fix: add return type annotation to _iter_as_paren_matches

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This commit is contained in:
alex-clawd
2026-04-08 23:21:38 -07:00
committed by GitHub
parent 06fe163611
commit ce56472fc3
9 changed files with 1757 additions and 276 deletions

View File

@@ -11,7 +11,75 @@ mode: "wide"
이를 통해 에이전트가 데이터베이스에 접근하여 목표에 따라 정보를 가져오고, 해당 정보를 사용해 응답, 보고서 또는 기타 출력물을 생성하는 다양한 워크플로우가 가능해집니다. 또한 에이전트가 자신의 목표에 맞춰 데이터베이스를 업데이트할 수 있는 기능도 제공합니다.
**주의**: 에이전트가 Read-Replica에 접근할 수 있거나, 에이전트가 데이터베이스에 insert/update 쿼리를 실행해도 괜찮은지 반드시 확인하십시오.
**주의**: 도구는 기본적으로 읽기 전용(SELECT/SHOW/DESCRIBE/EXPLAIN만 허용)으로 동작합니다. 쓰기 작업을 수행하려면 `allow_dml=True` 매개변수 또는 `CREWAI_NL2SQL_ALLOW_DML=true` 환경 변수가 필요합니다. 쓰기 접근이 활성화된 경우, 가능하면 권한이 제한된 데이터베이스 사용자나 읽기 복제본을 사용하십시오.
## 읽기 전용 모드 및 DML 구성
`NL2SQLTool`은 기본적으로 **읽기 전용 모드**로 동작합니다. 추가 구성 없이 허용되는 구문 유형은 다음과 같습니다:
- `SELECT`
- `SHOW`
- `DESCRIBE`
- `EXPLAIN`
DML을 명시적으로 활성화하지 않으면 쓰기 작업(`INSERT`, `UPDATE`, `DELETE`, `DROP`, `CREATE`, `ALTER`, `TRUNCATE` 등)을 실행하려고 할 때 오류가 발생합니다.
읽기 전용 모드에서는 세미콜론이 포함된 다중 구문 쿼리(예: `SELECT 1; DROP TABLE users`)도 인젝션 공격을 방지하기 위해 차단됩니다.
### 쓰기 작업 활성화
DML(데이터 조작 언어)을 활성화하는 방법은 두 가지입니다:
**옵션 1 — 생성자 매개변수:**
```python
from crewai_tools import NL2SQLTool
nl2sql = NL2SQLTool(
db_uri="postgresql://example@localhost:5432/test_db",
allow_dml=True,
)
```
**옵션 2 — 환경 변수:**
```bash
CREWAI_NL2SQL_ALLOW_DML=true
```
```python
from crewai_tools import NL2SQLTool
# 환경 변수를 통해 DML 활성화
nl2sql = NL2SQLTool(db_uri="postgresql://example@localhost:5432/test_db")
```
### 사용 예시
**읽기 전용(기본값) — 분석 및 보고 워크로드에 안전:**
```python
from crewai_tools import NL2SQLTool
# SELECT/SHOW/DESCRIBE/EXPLAIN만 허용
nl2sql = NL2SQLTool(db_uri="postgresql://example@localhost:5432/test_db")
```
**DML 활성화 — 쓰기 워크로드에 필요:**
```python
from crewai_tools import NL2SQLTool
# INSERT, UPDATE, DELETE, DROP 등이 허용됨
nl2sql = NL2SQLTool(
db_uri="postgresql://example@localhost:5432/test_db",
allow_dml=True,
)
```
<Warning>
DML을 활성화하면 에이전트가 데이터를 수정하거나 삭제할 수 있습니다. 사용 사례에서 명시적으로 쓰기 접근이 필요한 경우에만 활성화하고, 데이터베이스 자격 증명이 최소 필요 권한으로 제한되어 있는지 확인하십시오.
</Warning>
## 요구 사항