Skip to content

Latest commit

 

History

History
303 lines (232 loc) · 14.2 KB

File metadata and controls

303 lines (232 loc) · 14.2 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

tariff-everywhere is a Harmonized Tariff Schedule (HTS) lookup service built in Python. It downloads tariff classification data from the US International Trade Commission's public API, stores it in SQLite, and exposes it through two interfaces:

  1. CLI (hts.py) — terminal-based lookups for developers
  2. MCP Server (mcp_server.py) — Model Context Protocol tools for AI agents

All development runs in Docker. No Python, pip, or virtualenv required on the host.

Before You Start

Before beginning any work, build the Docker image and verify the environment:

# 1. Build the image
docker build -t hts-local .

# 2. Run the test suite (all tests should pass)
docker run --rm hts-local -m pytest tests/ -v

# 3. Smoke test the CLI
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py --help

If data/hts.db does not exist yet, run the ingest first:

docker run --rm -v "$(pwd)/data:/app/data" hts-local scripts/ingest.py

Quick Smoke Test

After building, verify the entire system works end-to-end:

# Run all tests
docker run --rm hts-local -m pytest tests/ -v
# Expected: 114 passed, 5 skipped

# Ingest data (if data/hts.db doesn't exist)
docker run --rm -v "$(pwd)/data:/app/data" hts-local scripts/ingest.py
# Expected: Database created with ~134K entries across 99 chapters

# Verify CLI works
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py chapters
# Expected: All 99 chapters listed with descriptions and entry counts

# Check refresh (should be fast if data is current)
docker run --rm -v "$(pwd)/data:/app/data" hts-local scripts/refresh.py
# Expected: "Already up to date" or new chapters ingested if data changed

Note on ingest idempotency: The ingest script skips duplicate HTS codes. When re-running after data already exists, you'll see Loaded 0 entries but also Skipped 134019 duplicate HTS codes — this is normal and expected. The script won't duplicate data on subsequent runs.

Architecture

Data Layer

  • Source: https://hts.usitc.gov/reststop/ (public government API, no auth required)
  • Schema: Three tables in data/hts.db:
    • chapters — HTS chapters (01-99), with descriptions, content hashes, and freshness timestamps (last_checked_at, last_changed_at)
    • hts_entries — ~134K tariff entries with rates, units, indent level, footnotes
    • data_freshness — records of each refresh run (timestamp, duration, chapters changed)
  • Indexes: hts_code (exact lookups), description (substring search)

Application Layer

File Purpose
scripts/ingest.py Download HTS data from API, iterate chapters 01-99, parse JSON, insert into SQLite
scripts/refresh.py Detect HTS data changes by hashing all 99 chapters in parallel, re-ingest if changed, track per-chapter freshness
hts.py CLI entrypoint (typer) with search, code, chapter, chapters, info commands
mcp_server.py Expose five tools over MCP stdio: search_hts, get_code, list_chapter, get_chapters, get_data_freshness
tariff_everywhere.py Public Python API with connection-managing wrappers for programmatic access

Key Patterns

  • Database connections: Each command opens/closes a connection in a try-finally block. No connection pooling needed for CLI/MCP (low concurrency).
  • Formatting: Two helper functions in hts.py (format_entry_as_dict, format_entry_for_table) standardize output across CLI table views and JSON responses.
  • JSON output: CLI uses print() (not Rich console.print()) for all JSON output to avoid ANSI control character injection. Rich is only used for table display.
  • MCP tools: Return JSON strings (not objects), matching MCP SDK conventions. Tool docstrings are exposed as help text to Claude.
  • Revision detection: scripts/refresh.py hashes all 99 chapters in parallel (ThreadPoolExecutor) and compares against stored hashes in the chapters table. Since /reststop/releases returns 404, this content-hash approach is the alternative. Per-chapter last_checked_at and last_changed_at timestamps distinguish "we looked" from "it was different."
  • Docker entrypoint is python: The Dockerfile uses ENTRYPOINT ["python"], so all arguments passed to docker run ... hts-local <args> become arguments to python. Script paths (e.g., scripts/ingest.py) work directly, but installed CLI tools like datasette must be invoked with -m (e.g., -m datasette). This also means tools that shell out to external binaries (like datasette publish fly needing flyctl) won't work inside the container.

Running & Development

Docker Setup (one-time)

docker build -t hts-local .

Run Commands

Ingest data (if data/hts.db doesn't exist):

docker run --rm -v "$(pwd)/data:/app/data" hts-local scripts/ingest.py

CLI usage (after ingest):

docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py search "copper wire"
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py code 7408.11
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py chapter 74
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py info
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py info --chapter 74
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py --help

Refresh data (check for updates and re-ingest if changed):

docker run --rm -v "$(pwd)/data:/app/data" hts-local scripts/refresh.py

MCP server (stdio, for Claude Desktop integration):

docker run --rm -i -v "$(pwd)/data:/app/data" hts-local mcp_server.py

Running Tests

docker run --rm hts-local -m pytest tests/ -v

The test suite covers CLI commands, MCP server tools, and edge cases using an in-memory SQLite fixture. No real database or API access needed.

Testing a Command Locally (without Docker)

If you have Python 3.12+ installed:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Now test directly
python hts.py search "titanium"
python hts.py code 0101.21.00

# Or verify DB directly
python -c "import sqlite3; db = sqlite3.connect('data/hts.db'); print(db.execute('SELECT COUNT(*) FROM hts_entries').fetchone()[0])"

Common Development Tasks

Add a New CLI Command

  1. Add a @app.command() function in hts.py
  2. Use typer.Argument() for positional args, typer.Option() for flags
  3. Follow the pattern: connect to DB → execute query → format output (JSON or table) → close DB
  4. For --json output, use print() — never console.print() (Rich injects control characters)
  5. Update the table schema in both format_entry_for_table and format_entry_as_dict if querying new columns
  6. Add corresponding tests in tests/test_cli.py

Add a New MCP Tool

  1. Add a @mcp.tool() function in mcp_server.py
  2. Docstring becomes the tool description (shown to Claude)
  3. Always return a JSON string (json.dumps())
  4. Follow the DB pattern: open → execute → format → close
  5. Handle errors gracefully (return JSON error object, don't raise)
  6. Add corresponding tests in tests/test_mcp.py

Add or Update the Public Python API

  1. Edit tariff_everywhere.py to expose connection-managing functions for external callers
  2. Reuse hts_core.py for SQL queries and row-to-dict conversion instead of duplicating query logic
  3. Keep return values JSON-serializable dictionaries/lists so callers can export them directly
  4. Raise FileNotFoundError when the SQLite database is missing; return None or [] for not-found lookups
  5. Add/update tests in tests/test_python_api.py
  6. Update docs/PYTHON_API.md and the README link when the public surface changes

Update the SQLite Schema

  1. Edit create_schema() in scripts/ingest.py
  2. Re-run ingest to rebuild: docker run --rm -v "$(pwd)/data:/app/data" hts-local scripts/ingest.py (will recreate tables)
  3. Update column references in formatting functions if needed

Verify Data Integrity

# Count entries by chapter
docker run --rm -v "$(pwd)/data:/app/data" hts-local -c "
import sqlite3
db = sqlite3.connect('data/hts.db')
result = db.execute('SELECT COUNT(*) FROM hts_entries').fetchone()[0]
print(f'Total entries: {result}')
"

# Quick smoke test
docker run --rm -v "$(pwd)/data:/app/data" hts-local hts.py code 0101.21.00

API & Data Notes

HTS API Endpoint

GET https://hts.usitc.gov/reststop/search?keyword=chapter%20XX&limit=5000
  • No authentication required (public endpoint)
  • Returns flat JSON array of entries
  • All 99 chapters can be fetched in parallel (~15-20s for full ingest)
  • The original plan endpoint (/exportSections?format=JSON) is no longer operational

Data Model

Each hts_entries row contains:

  • hts_code — tariff code (e.g., "7408.11.30")
  • description — product description
  • indent — hierarchy level (0 = chapter, 1 = heading, etc.)
  • unit — measurement unit (e.g., "kg", "liters")
  • general_rate, special_rate, column2_rate — duty rates (strings like "5%", "Free")
  • footnotes — JSON string of footnote objects (may be empty string)
  • chapter_id — foreign key to chapters table

Note: The CLI SELECT queries omit footnotes — the format_entry_as_dict column list has 9 columns. The MCP server queries 6 columns (no id, indent, footnotes, chapter_id).

MCP Server Integration (Claude Desktop)

The MCP server runs locally via Docker with stdio transport — it does not expose a network port and is not deployable to a remote service. This approach keeps the tariff database private and avoids cloud infrastructure.

To use the HTS tools in Claude Desktop:

{
  "mcpServers": {
    "hts": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-v", "/absolute/path/to/tariff-everywhere/data:/app/data",
        "hts-local",
        "mcp_server.py"
      ]
    }
  }
}

The server uses stdio transport (no port exposed). Claude Desktop spawns the container, communicates over stdin/stdout, and the container exits cleanly when the session ends.

Why local-only? Remote HTTP deployment was explored but not pursued. Stdio transport over Docker is simpler, more secure (data never leaves your machine), and eliminates infrastructure overhead.

Datasette Integration

Live at: https://tariff-everywhere.fly.dev/

The database is published as a public Datasette instance for browsable web access.

Key Files

  • metadata.json — Datasette configuration (titles, facets, label columns, descriptions)
  • scripts/build_fts.py — Builds FTS5 full-text search index using sqlite-utils (critical: must use sqlite-utils, not manual SQL, for Datasette to detect FTS)
  • scripts/chapter_titles.py — Enriches chapters.description with real HTS titles ("Live Animals" instead of "Chapter 01")
  • requirements.txt — Includes datasette, datasette-search-all, datasette-render-html, datasette-publish-fly, sqlite-utils

Critical Learnings

FTS5 Detection: Datasette only auto-detects FTS5 tables created by sqlite-utils. Manual SQL creation (with content_rowid= parameter) breaks Datasette's search. Always use:

import sqlite_utils
db = sqlite_utils.Database("data/hts.db")
db["hts_entries"].enable_fts(["description"], fts_version="fts5")

Typer Compatibility: Typer 0.15.x breaks with click 8.3+ (signature change in Parameter.make_metavar()). Pin typer~=0.24.0 for click 8.3+ compatibility.

Chapter UX: The chapters table now uses label_column: "description" (real titles like "Copper and Articles Thereof") instead of just chapter numbers. A browse_chapters SQL view shows entry counts per chapter for easier navigation.

HTML Rendering: 1,535 entries have <i> tags for scientific names. The datasette-render-html plugin renders these correctly; without it, raw <i> text appears.

Deploying to Fly.io

CI (recommended): The deploy-datasette.yml workflow handles the full pipeline — ingest, chapter title enrichment, FTS rebuild, and deploy. Trigger it manually via workflow_dispatch. Data prep steps run inside Docker (volume-mounted to the runner), but the deploy step runs directly on the runner because datasette publish fly requires flyctl, which isn't in the Docker image.

Manual deployment:

# 1. Update chapter titles and rebuild FTS
python3 scripts/chapter_titles.py data/hts.db
python3 -m sqlite_utils enable-fts data/hts.db hts_entries description --fts5 --replace

# 2. Deploy (requires flyctl auth login)
datasette publish fly data/hts.db \
  --app="tariff-everywhere" \
  --metadata metadata.json \
  --install=datasette-search-all \
  --install=datasette-render-html \
  --setting default_page_size 50

The deployment is automatic: image build (~52 MB), two machines provisioned on Fly.io free tier, zero cost.

Known Limitations

  • MCP server is local-only — No remote HTTP deployment. The MCP server only runs locally via Docker (stdio transport). This is by design: keeps data private, reduces infrastructure, and simplifies setup. If remote MCP access is needed, run Claude on the same machine as the Docker container.
  • Single-threaded CLI — no parallel queries; acceptable for interactive lookups
  • No pagination in CLI search — hardcoded limit of 10 results; use --limit flag to increase
  • Revision detection is content-hash basedscripts/refresh.py hashes all 99 chapters to detect changes, but cannot distinguish USITC revision numbers (the API provides none)
  • format_entry_as_dict column mapping — uses positional zip against hardcoded column names; fragile if the SELECT changes. Consider using cursor.description or named tuples.

Debugging

Database locked error:

  • Likely a stale connection. Ensure all CLI commands close the DB in a finally block.
  • If Docker container hangs, docker ps to find the container ID, then docker kill <id>.

"hts.db not found" error:

  • Run the ingest script first to populate data/hts.db.

MCP server not starting:

  • Check that the data volume is mounted and readable: docker run --rm -v "$(pwd)/data:/app/data" hts-local ls -la /app/data/

Slow searches:

  • Add missing indexes if querying new columns; see scripts/ingest.py:create_schema().