Categorization Ingestion JSON Format

Updated at January 26th, 2026

1. Overview

This guide explains how to build ingestion JSON for categorization projects. Its purpose is to help fully understand the structure, rules, and meaning of each part for a better understanding of how to create ingestion payloads.

If you understand the most complex structure described here, you will be able to construct any simpler ingestion JSON by omission.

2. The Mental Model (How to Think About Ingestion JSON)

Ingestion JSON is not a mirror of the internal system. Instead, it is a declarative description of intent.

You describe:

What data annotators should see (context)
What should be classified or filled (annotations)

The platform is responsible for:

Creating internal objects
Assigning IDs
Flattening hierarchical structures
Applying taxonomy defaults and validation

You never create internal IDs or objects.

3. Overall Hierarchy and Levels

The ingestion JSON is hierarchical and always follows the same pattern:

Task
 └─ items[]                ← units of work
     ├─ context             ← reference data only
     └─ annotations         ← what is being categorized

Every ingestion payload:

Creates exactly one task
Contains one or more items
Each item has context and annotations

4. Top-Level Task Object

The top-level object describes the task itself.

{
  "primary_keys": ["SKU-123"],
  "metadata": { "source": "catalog" },
  "items": [ ... ]
}

Task Fields

primary_keys (optional)

primary_keys is an array of strings used to attach external identifiers to the task. These values are provided entirely by the client and are never modified by the platform.

Typical use cases include:

Product SKUs
Content IDs
Database record identifiers
Composite business keys

Behavior and constraints:

The platform does not validate or interpret these values
They are not required to be unique
They do not affect task routing, annotation logic, or taxonomy validation
They are returned unchanged in exports

Use primary_keys strictly for traceability and joining exported results back to their own systems.

metadata (optional)

metadata is a free-form object that allows clients to attach task-level contextual information.

This information is:

Stored and returned with the task
Not shown to annotators by default
Not validated against the taxonomy

Common examples include:

Data source identifiers
Campaign or batch names
Timestamps or version numbers
Internal processing flags

"metadata": {
  "source": "catalog_ingestion",
  "batch_id": "2025-01-07-A",
  "pipeline_version": "v3.2"
}

Important: Metadata does not influence how annotations are created, validated, or displayed. It exists solely for client-side organization and downstream processing.

items (required)

items is an array that defines the actual units of work for annotation. Each element in this array becomes a separate item presented to annotators.

An ingestion JSON must contain at least one item.

Each item:

Represents a single logical entity to be categorized (for example, one product or one piece of content)
Contains its own context and annotations
Is processed independently of other items in the same task

Multiple items should be used when:

You want to group related entities into a single task
Each entity shares common task-level metadata
Items can be annotated independently

Important constraints:

Items cannot reference each other
Context and annotations are scoped to a single item only
Failure in one item does not affect the structure of other items

Clients should treat each item as a fully self-contained unit of annotation work.

5. Items (Units of Work)

Each object inside items[] represents a single unit of work for annotators.

{
  "context": { ... },
  "annotations": { ... }
}

Both context and annotations are required, even if empty.

6. Context Object (What Annotators See)

The context object contains reference information.

Context:

Is not validated against the taxonomy
Can contain any field names
Supports text and media

Display Ordering

Use __order to control how fields appear in the UI.

"context": {
  "__order": ["SKU", "Title", "Image"],
  "SKU": "ABC123",
  "Title": "Wireless Headphones"
}

Context Field Types

Plain Text

"SKU": "ABC123"

Formatted Text

"Description": {
  "type": "text",
  "format": "markdown",
  "text": "This is **bold** text"
}

Image

"Image": {
  "type": "image",
  "url": "https://example.com/image.jpg"
}

Video

"Video": {
  "type": "video",
  "url": "https://example.com/video.mp4"
}

7. Annotations Object (What Gets Classified)

The annotations object defines the categorization output.

The keys inside this object are meaningful. They determine how the platform interprets your data.

Annotation Strategies

Before creating an annotation, ask:

Question	Answer
Is the project dynamic?	Use name-based keys + `__type`
Is the project non-dynamic?	Use class or group names
Is this external data?	Use `__placeholder_*`

8. Dynamic Annotations (Most Complex Case)

Dynamic annotations are keyed by the taxonomy’s name attribute.

They must include __type.

"product_height": {
  "__type": "dimension_attribute",
  "value": "20",
  "unit": "cm"
}

The key (product_height) becomes the name attribute
__type identifies the taxonomy class
Missing __type causes ingestion to fail

9. Non-Dynamic Annotations

Non-dynamic annotations use fixed taxonomy classes or group labels.

"Content Classification": {
  "confidence": "high"
}

These annotations:

Do not require __type
May include __mutable if allowed

10. Placeholder Annotations

Placeholders allow you to attach arbitrary data without taxonomy validation.

"__placeholder_external": {
  "legacy_score": 0.92
}

Use placeholders only when taxonomy validation is not desired.

11. Complete Complex Ingestion Example

This example demonstrates all supported structures together.

{
  "primary_keys": ["SKU-98765"],
  "metadata": {
    "source": "product_catalog"
  },
  "items": [
    {
      "context": {
        "__order": ["SKU", "Title", "Image", "Video"],
        "SKU": "SKU-98765",
        "Title": "Wireless Headphones",
        "Image": {
          "type": "image",
          "url": "https://example.com/image.jpg"
        },
        "Video": {
          "type": "video",
          "url": "https://example.com/video.mp4"
        }
      },
      "annotations": {
        "__order": ["product_height", "Content Classification"],
        "product_height": {
          "__type": "dimension_attribute",
          "value": "20",
          "unit": "cm"
        },
        "Content Classification": {
          "confidence": "high"
        },
        "__placeholder_external": {
          "external_score": 0.87
        }
      }
    }
  ]
}

1. Overview 2. The Mental Model (How to Think About Ingestion JSON) 3. Overall Hierarchy and Levels 4. Top-Level Task Object Task Fields primary_keys (optional) metadata (optional) items (required) 5. Items (Units of Work) 6. Context Object (What Annotators See) Display Ordering Context Field Types Plain Text Formatted Text Image Video 7. Annotations Object (What Gets Classified) Annotation Strategies 8. Dynamic Annotations (Most Complex Case) 9. Non-Dynamic Annotations 10. Placeholder Annotations 11. Complete Complex Ingestion Example

Contact Us