API Reference
The PDF Processor API is served at https://api.docsights.ai/v1. All requests require the X-Api-Key header.
Overview
| Item | Value |
|---|---|
| Base URL | https://api.docsights.ai/v1 |
| Auth | X-Api-Key: YOUR_API_KEY (header) |
| Content type | application/json |
Documents and Jobs
Add a new document
POST /documents
Submit a document for extraction. Request body must include document with url and name. Optional: schema (inference schema), documentWebhook, jobWebhook.
Request body: DocumentRequest
- document (required):
{ url: string (uri), name: string } - schema (optional):
InferenceSchema— nested fields withdataType(string | number | boolean | object | array), optionalrequest,fields(for object),items(for array) - documentWebhook (optional): URI for document completion notification
- jobWebhook (optional): URI for job completion notification
Response: 202 Accepted — DocumentResponse
- document:
{ documentId (uuid), jobId? (uuid), status: "accepted" }
Get all documents
GET /documents
List documents with optional pagination and time range.
Query parameters:
- cursor (optional): base64 cursor for pagination
- from (optional): unix timestamp (start)
- to (optional): unix timestamp (end)
Response: 200 — DocumentListResponse
- documents: array of
{ name, id (uuid), status }— status one of: ACCEPTED, PROCESSING, ERROR, FINISHED, BILLED - nextCursor: string or null
Get document by ID
GET /documents/{id}
Get details for a single document. id is a UUID path parameter.
Query parameters:
- include-markdown (optional, default false): include markdown in response
- include-doctags (optional, default false): include doctags in response
Response: 200 — DocumentDetailResponse
- documentId, status, billing (extractionCost, totalQueryCost), jobs (array of jobId + status), optional markdown, doctags
Add a new job
POST /documents/{id}/jobs
Run an inference job on an existing document. id is the document UUID.
Request body: JobRequest
- schema (required):
InferenceSchema - jobWebhook (optional): URI for job completion
Response: 202 Accepted — JobAcceptedResponse
- documentId, jobId, status: "ACCEPTED"
Get job by ID
GET /documents/{id}/jobs/{jobId}
Get job status and extracted output. id and jobId are UUID path parameters.
Response: 200 — JobDetailResponse
- jobId, status, requestSchema, outputSchema (extracted data per requestSchema)
Schema reference
- InferenceSchemaField:
dataType(required), optionalrequest,fields(for object),items(for array). Recursive for nested structures. - InferenceSchema: object with values as InferenceSchemaField.
- Document and job status values: ACCEPTED, PROCESSING, ERROR, FINISHED, BILLED.
For a machine-readable spec, use the OpenAPI definition at api.docsights.ai or the api-spec.yaml in the repo to generate clients.