Categorization Ingestion JSON Format
Updated at January 26th, 2026
1. Overview
This guide explains how to build ingestion JSON for categorization projects. Its purpose is to help fully understand the structure, rules, and meaning of each part for a better understanding of how to create ingestion payloads.
If you understand the most complex structure described here, you will be able to construct any simpler ingestion JSON by omission.
2. The Mental Model (How to Think About Ingestion JSON)
Ingestion JSON is not a mirror of the internal system. Instead, it is a declarative description of intent.
You describe:
- What data annotators should see (context)
- What should be classified or filled (annotations)
The platform is responsible for:
- Creating internal objects
- Assigning IDs
- Flattening hierarchical structures
- Applying taxonomy defaults and validation
You never create internal IDs or objects.
3. Overall Hierarchy and Levels
The ingestion JSON is hierarchical and always follows the same pattern:
Task
└─ items[] ← units of work
├─ context ← reference data only
└─ annotations ← what is being categorizedEvery ingestion payload:
- Creates exactly one task
- Contains one or more items
- Each item has context and annotations
4. Top-Level Task Object
The top-level object describes the task itself.
{
"primary_keys": ["SKU-123"],
"metadata": { "source": "catalog" },
"items": [ ... ]
}Task Fields
primary_keys (optional)
primary_keys is an array of strings used to attach external identifiers to the task. These values are provided entirely by the client and are never modified by the platform.
Typical use cases include:
- Product SKUs
- Content IDs
- Database record identifiers
- Composite business keys
Behavior and constraints:
- The platform does not validate or interpret these values
- They are not required to be unique
- They do not affect task routing, annotation logic, or taxonomy validation
- They are returned unchanged in exports
Use primary_keys strictly for traceability and joining exported results back to their own systems.
metadata (optional)
metadata is a free-form object that allows clients to attach task-level contextual information.
This information is:
- Stored and returned with the task
- Not shown to annotators by default
- Not validated against the taxonomy
Common examples include:
- Data source identifiers
- Campaign or batch names
- Timestamps or version numbers
- Internal processing flags
"metadata": {
"source": "catalog_ingestion",
"batch_id": "2025-01-07-A",
"pipeline_version": "v3.2"
}Important: Metadata does not influence how annotations are created, validated, or displayed. It exists solely for client-side organization and downstream processing.
items (required)
items is an array that defines the actual units of work for annotation. Each element in this array becomes a separate item presented to annotators.
An ingestion JSON must contain at least one item.
Each item:
- Represents a single logical entity to be categorized (for example, one product or one piece of content)
- Contains its own
contextandannotations - Is processed independently of other items in the same task
Multiple items should be used when:
- You want to group related entities into a single task
- Each entity shares common task-level metadata
- Items can be annotated independently
Important constraints:
- Items cannot reference each other
- Context and annotations are scoped to a single item only
- Failure in one item does not affect the structure of other items
Clients should treat each item as a fully self-contained unit of annotation work.
5. Items (Units of Work)
Each object inside items[] represents a single unit of work for annotators.
{
"context": { ... },
"annotations": { ... }
}Both context and annotations are required, even if empty.
6. Context Object (What Annotators See)
The context object contains reference information.
Context:
- Is not validated against the taxonomy
- Can contain any field names
- Supports text and media
Display Ordering
Use __order to control how fields appear in the UI.
"context": {
"__order": ["SKU", "Title", "Image"],
"SKU": "ABC123",
"Title": "Wireless Headphones"
}Context Field Types
Plain Text
"SKU": "ABC123"Formatted Text
"Description": {
"type": "text",
"format": "markdown",
"text": "This is **bold** text"
}Image
"Image": {
"type": "image",
"url": "https://example.com/image.jpg"
}Video
"Video": {
"type": "video",
"url": "https://example.com/video.mp4"
}7. Annotations Object (What Gets Classified)
The annotations object defines the categorization output.
The keys inside this object are meaningful. They determine how the platform interprets your data.
Annotation Strategies
Before creating an annotation, ask:
| Question | Answer |
| Is the project dynamic? | Use name-based keys + __type
|
| Is the project non-dynamic? | Use class or group names |
| Is this external data? | Use __placeholder_*
|
8. Dynamic Annotations (Most Complex Case)
Dynamic annotations are keyed by the taxonomy’s name attribute.
They must include __type.
"product_height": {
"__type": "dimension_attribute",
"value": "20",
"unit": "cm"
}- The key (
product_height) becomes the name attribute -
__typeidentifies the taxonomy class - Missing
__typecauses ingestion to fail
9. Non-Dynamic Annotations
Non-dynamic annotations use fixed taxonomy classes or group labels.
"Content Classification": {
"confidence": "high"
}These annotations:
- Do not require
__type - May include
__mutableif allowed
10. Placeholder Annotations
Placeholders allow you to attach arbitrary data without taxonomy validation.
"__placeholder_external": {
"legacy_score": 0.92
}Use placeholders only when taxonomy validation is not desired.
11. Complete Complex Ingestion Example
This example demonstrates all supported structures together.
{
"primary_keys": ["SKU-98765"],
"metadata": {
"source": "product_catalog"
},
"items": [
{
"context": {
"__order": ["SKU", "Title", "Image", "Video"],
"SKU": "SKU-98765",
"Title": "Wireless Headphones",
"Image": {
"type": "image",
"url": "https://example.com/image.jpg"
},
"Video": {
"type": "video",
"url": "https://example.com/video.mp4"
}
},
"annotations": {
"__order": ["product_height", "Content Classification"],
"product_height": {
"__type": "dimension_attribute",
"value": "20",
"unit": "cm"
},
"Content Classification": {
"confidence": "high"
},
"__placeholder_external": {
"external_score": 0.87
}
}
}
]
}