# Wakatipu API User Documentation

Production API URL: `https://wakatipu-api.appcloud.cyou`

Version: `0.3.4`

Wakatipu API turns documents, PDFs, images, spreadsheets, email files, archives, web pages, and text payloads into structured JSON. It combines the original Wakatipu file extraction tools with the migrated Lucerne web, text, ML, vector, workflow, and retained-output endpoints.

## Simple User Path

1. Create or open your Wakatipu API account at `https://wakatipu-api.appcloud.cyou/account`.
2. Use the 5 free API credits added after signup, or buy more API credits.
3. Create an API key from the account page.
4. Send `X-API-Key: <your key>` with every `/api/...` request.
5. Start with `/api/documents/extract-text`, `/api/text/clean`, or `/api/web/extract-article`.

New accounts receive 5 free API credits. Successful paid API calls consume one API credit. Failed calls do not consume quota. Monthly credits are used before one-time credits, then free/one-time credits. Background job submit endpoints consume a credit only when the submit request succeeds; polling a job does not debit another credit.

## Ethical And Safety Use

Use Wakatipu for content you own, are allowed to process, or are permitted to access. Do not use it to bypass paywalls, access controls, rate limits, robots policies, private networks, authentication, or copyright restrictions.

Important guardrails:

- URL tools are intended for public web URLs and include private-network protections.
- Archive tools inspect for unsafe paths before extraction-style workflows.
- PII helpers are best-effort detection aids, not a compliance guarantee.
- OCR, entity extraction, language detection, classification, and similarity results can be wrong; review important results before acting on them.
- Do not upload sensitive third-party data unless your own policy, consent, and retention rules allow it.
- Do not use extraction results for impersonation, credential theft, spam, surveillance, discriminatory decisions, or unauthorized scraping.

## Quick Links

- API home: `https://wakatipu-api.appcloud.cyou/`
- All links and endpoints: `https://wakatipu-api.appcloud.cyou/all-links`
- Swagger/OpenAPI: `https://wakatipu-api.appcloud.cyou/docs`
- Complete endpoint catalog: `https://wakatipu-api.appcloud.cyou/static/docs/expanded-api-catalog.html`
- Service status: `https://wakatipu-api.appcloud.cyou/status`
- Health readiness: `https://wakatipu-api.appcloud.cyou/health/ready`

## Authentication

Every machine endpoint under `/api/...` requires:

```http
X-API-Key: <your-api-key>
```

Example:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  https://wakatipu-api.appcloud.cyou/api/tools/catalog
```

API keys are managed from the account page. You can create keys, rename them for easier tracking, and revoke old keys. Revoked keys become inactive and are not shown again as raw secrets.

## Billing Product

Wakatipu API uses one paid product for machine requests:

| Product | Use it for |
| --- | --- |
| Wakatipu API | Machine API calls with customer-owned API keys. |

Plans are Standard `$2` for `50` uses, Pro `$5` for `200` uses, and Ultimate `$15` for `1200` uses. Each plan can be bought monthly or as a one-time pack.

New accounts receive 5 free API credits. Monthly credits are added by an active subscription. One-time packs stack with monthly and free credits. Successful requests consume monthly credits first, then one-time/free credits. Failed requests and blocked quota checks do not consume credits.

Checkout config:

```bash
curl "https://wakatipu-api.appcloud.cyou/api/billing/checkout-links?email=you@example.com"
```

## Response Shape

Most successful responses include:

- `ok`: completion flag.
- `tool`: package, backend, or fallback used.
- `request_id`: correlation ID.
- `input_size`: uploaded input size in bytes for file tools.
- `output_size`: approximate response size for file tools.
- Tool-specific fields such as `text`, `tables`, `metadata`, `entries`, `pages`, or `result`.

Errors are JSON and usually include `error`, `code`, or `message`. Quota exhaustion returns `402`.

## Ten High-Value Endpoints

| Method | Endpoint | Use |
| --- | --- | --- |
| `POST` | `/api/documents/extract-text` | General text extraction for text, PDF, JSON, XML, markdown, and office-style files. |
| `POST` | `/api/pdf/extract-tables` | Extract tables from PDFs. |
| `POST` | `/api/ocr/image` | OCR images and scans. |
| `POST` | `/api/spreadsheets/read` | Read CSV and Excel-style rows. |
| `POST` | `/api/email/parse` | Parse EML or MSG email files. |
| `POST` | `/api/archives/inspect` | Inspect archives and flag unsafe entries. |
| `POST` | `/api/web/extract-article` | Extract readable article text from a public URL. |
| `POST` | `/api/text/clean` | Normalize and clean text. |
| `POST` | `/api/text/pii-redact` | Redact common email and phone patterns. |
| `POST` | `/api/vectors/search` | Rank supplied documents against a query. |

## File Upload Examples

General document text:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -F "file=@sample.pdf" \
  https://wakatipu-api.appcloud.cyou/api/documents/extract-text
```

PDF tables:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -F "file=@report.pdf" \
  https://wakatipu-api.appcloud.cyou/api/pdf/extract-tables
```

OCR:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -F "file=@scan.png" \
  https://wakatipu-api.appcloud.cyou/api/ocr/image
```

Spreadsheet preview:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -F "file=@data.csv" \
  -F "max_rows=25" \
  https://wakatipu-api.appcloud.cyou/api/spreadsheets/read
```

## JSON Examples

Clean text:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Fran\u00e7ais   text\n\nwith odd spacing"}' \
  https://wakatipu-api.appcloud.cyou/api/text/clean
```

Extract article:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/story","max_chars":4000}' \
  https://wakatipu-api.appcloud.cyou/api/web/extract-article
```

Vector search:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query":"invoice total","documents":["invoice total is $42","meeting notes"],"limit":1}' \
  https://wakatipu-api.appcloud.cyou/api/vectors/search
```

PII redaction:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Email rahul@example.com or call +1 555 010 0200."}' \
  https://wakatipu-api.appcloud.cyou/api/text/pii-redact
```

## Python Examples

Upload a file:

```python
import os
import requests

base_url = "https://wakatipu-api.appcloud.cyou"
headers = {"X-API-Key": os.environ["WAKATIPU_API_KEY"]}

with open("sample.pdf", "rb") as handle:
    response = requests.post(
        f"{base_url}/api/documents/extract-text",
        headers=headers,
        files={"file": ("sample.pdf", handle, "application/pdf")},
        timeout=120,
    )

response.raise_for_status()
print(response.json()["text"][:500])
```

Call a JSON endpoint:

```python
import os
import requests

base_url = "https://wakatipu-api.appcloud.cyou"
headers = {
    "X-API-Key": os.environ["WAKATIPU_API_KEY"],
    "Content-Type": "application/json",
}
payload = {"text": "Messy   text\n\nthat needs cleanup"}

response = requests.post(f"{base_url}/api/text/clean", headers=headers, json=payload, timeout=60)
response.raise_for_status()
print(response.json()["result"])
```

## Node.js Examples

Upload a file:

```javascript
import fs from "node:fs";

const baseUrl = "https://wakatipu-api.appcloud.cyou";
const form = new FormData();
form.append("file", new Blob([fs.readFileSync("sample.pdf")]), "sample.pdf");

const response = await fetch(`${baseUrl}/api/documents/extract-text`, {
  method: "POST",
  headers: {"X-API-Key": process.env.WAKATIPU_API_KEY},
  body: form
});

if (!response.ok) throw new Error(await response.text());
console.log(await response.json());
```

Call a JSON endpoint:

```javascript
const response = await fetch("https://wakatipu-api.appcloud.cyou/api/text/clean", {
  method: "POST",
  headers: {
    "X-API-Key": process.env.WAKATIPU_API_KEY,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({text: "Messy   text\n\nthat needs cleanup"})
});

if (!response.ok) throw new Error(await response.text());
console.log(await response.json());
```

## Background Jobs

Longer file workflows can be queued:

| Method | Endpoint | Use |
| --- | --- | --- |
| `POST` | `/api/jobs/pdf/render-pages` | Queue PDF rendering. |
| `POST` | `/api/jobs/ocr/image` | Queue image OCR. |
| `POST` | `/api/jobs/media/transcribe` | Queue media transcription. |
| `GET` | `/api/jobs/{job_id}` | Poll one background job. |
| `GET` | `/api/jobs` | List recent jobs. |

Example:

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  -F "file=@sample.mp3" \
  https://wakatipu-api.appcloud.cyou/api/jobs/media/transcribe
```

```bash
curl -H "X-API-Key: $WAKATIPU_API_KEY" \
  https://wakatipu-api.appcloud.cyou/api/jobs/<job_id>
```

Python polling:

```python
import os
import time
import requests

base_url = "https://wakatipu-api.appcloud.cyou"
headers = {"X-API-Key": os.environ["WAKATIPU_API_KEY"]}

with open("sample.mp3", "rb") as handle:
    submit = requests.post(f"{base_url}/api/jobs/media/transcribe", headers=headers, files={"file": handle}, timeout=60)
submit.raise_for_status()
job_id = submit.json()["job_id"]

while True:
    status = requests.get(f"{base_url}/api/jobs/{job_id}", headers=headers, timeout=30)
    status.raise_for_status()
    payload = status.json()
    if payload["status"] in {"succeeded", "failed", "cancelled"}:
        print(payload)
        break
    time.sleep(2)
```

## Endpoint Groups

Core Wakatipu file endpoints:

- Documents: `/api/documents/extract-text`
- PDF: `/api/pdf/extract-text`, `/api/pdf/extract-tables`, `/api/pdf/metadata`, `/api/pdf/render-pages`
- OCR and images: `/api/ocr/image`, `/api/images/metadata`, `/api/images/barcodes`
- Office and spreadsheets: `/api/office/extract-text`, `/api/spreadsheets/read`
- Email and archives: `/api/email/parse`, `/api/archives/inspect`
- Media: `/api/media/transcribe`
- URL metadata: `/api/urls/fetch`

Expanded file API groups:

- PDF operations: split, rotate, delete pages, extract images, annotations, form fields, fill/flatten forms, redact, encryption, decrypt, compress, pages-to-images, OCR-searchable, invoice fields, table export, merge.
- File metadata: type, metadata, duplicate fingerprint.
- DOCX/PPTX/ODF/RTF: structure, comments and changes, markdown, speaker notes, images, thumbnails, text extraction.
- Spreadsheet and CSV: schema, formulas, comments, named ranges, normalized JSON, dialect detect, profile, compare.
- Images/OCR/barcodes: EXIF, palette, thumbnail, convert, orientation, perceptual hash, OCR boxes/language/confidence/layout, barcode generation.
- Archives/email/media: manifest, risk scan, safe extract plan, nested inspect, password detection, attachment lists, header analysis, thread summary, PII contacts, media metadata, waveform, thumbnail, trim, transcode, subtitles.
- Jobs and outputs: batch run, retained output download/signed link, job cancel/retry/webhook.

Migrated Lucerne JSON groups:

- Web: article extraction, readability, browser snapshot, links, CSS selectors, URL metadata, sitemap, robots, crawl, feed, screenshots, PDF export, structured data, social cards, broken links, readability score, HTML sanitization, diff, paywall heuristic, locale/language.
- Text: clean, language, entities, keywords, dates, embeddings, similarity, dedupe, topics, summarize, rewrite, sentiment, risk flags, PII redact, entity link, custom keywords.
- ML/vector/workflow: classify, cluster, taxonomy classify, multilabel, cluster summaries, vector search, vector index, qdrant-style search, semantic dedupe, workflow run/jobs/templates/schedule, retained JSON outputs.

For the exact complete list, use the expanded catalog:

`https://wakatipu-api.appcloud.cyou/static/docs/expanded-api-catalog.html`

## Operational Limits

- Default max upload size is `25 MB`.
- Default request timeout is `120 seconds`.
- Rate-limit responses return `429`.
- Quota exhaustion returns `402`.
- Native optional tools can return fallback results or controlled unavailable responses when the runtime lacks a binary or model.
