Copied
Docs

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

EMPLOYEE LOGIN
  • Home
  • Getting Started
  • Annotate
  • Tasks
  • API
  • Recipes
  • Integrations

Categorization Ingestion JSON Format

Updated at January 26th, 2026

1. Overview

This guide explains how to build ingestion JSON for categorization projects. Its purpose is to help fully understand the structure, rules, and meaning of each part for a better understanding of how to create ingestion payloads.

If you understand the most complex structure described here, you will be able to construct any simpler ingestion JSON by omission.

2. The Mental Model (How to Think About Ingestion JSON)

Ingestion JSON is not a mirror of the internal system. Instead, it is a declarative description of intent.

You describe:

  • What data annotators should see (context)
  • What should be classified or filled (annotations)

The platform is responsible for:

  • Creating internal objects
  • Assigning IDs
  • Flattening hierarchical structures
  • Applying taxonomy defaults and validation

You never create internal IDs or objects.

3. Overall Hierarchy and Levels

The ingestion JSON is hierarchical and always follows the same pattern:

Task
 └─ items[]                ← units of work
     ├─ context             ← reference data only
     └─ annotations         ← what is being categorized

Every ingestion payload:

  • Creates exactly one task
  • Contains one or more items
  • Each item has context and annotations

4. Top-Level Task Object

The top-level object describes the task itself.

{
  "primary_keys": ["SKU-123"],
  "metadata": { "source": "catalog" },
  "items": [ ... ]
}

Task Fields

primary_keys (optional)

primary_keys is an array of strings used to attach external identifiers to the task. These values are provided entirely by the client and are never modified by the platform.

Typical use cases include:

  • Product SKUs
  • Content IDs
  • Database record identifiers
  • Composite business keys

Behavior and constraints:

  • The platform does not validate or interpret these values
  • They are not required to be unique
  • They do not affect task routing, annotation logic, or taxonomy validation
  • They are returned unchanged in exports

Use primary_keys strictly for traceability and joining exported results back to their own systems.

metadata (optional)

metadata is a free-form object that allows clients to attach task-level contextual information.

This information is:

  • Stored and returned with the task
  • Not shown to annotators by default
  • Not validated against the taxonomy

Common examples include:

  • Data source identifiers
  • Campaign or batch names
  • Timestamps or version numbers
  • Internal processing flags
"metadata": {
  "source": "catalog_ingestion",
  "batch_id": "2025-01-07-A",
  "pipeline_version": "v3.2"
}

Important: Metadata does not influence how annotations are created, validated, or displayed. It exists solely for client-side organization and downstream processing.

items (required)

items is an array that defines the actual units of work for annotation. Each element in this array becomes a separate item presented to annotators.

An ingestion JSON must contain at least one item.

Each item:

  • Represents a single logical entity to be categorized (for example, one product or one piece of content)
  • Contains its own context and annotations
  • Is processed independently of other items in the same task

Multiple items should be used when:

  • You want to group related entities into a single task
  • Each entity shares common task-level metadata
  • Items can be annotated independently

Important constraints:

  • Items cannot reference each other
  • Context and annotations are scoped to a single item only
  • Failure in one item does not affect the structure of other items

Clients should treat each item as a fully self-contained unit of annotation work.

5. Items (Units of Work)

Each object inside items[] represents a single unit of work for annotators.

{
  "context": { ... },
  "annotations": { ... }
}

Both context and annotations are required, even if empty.

6. Context Object (What Annotators See)

The context object contains reference information.

Context:

  • Is not validated against the taxonomy
  • Can contain any field names
  • Supports text and media

Display Ordering

Use __order to control how fields appear in the UI.

"context": {
  "__order": ["SKU", "Title", "Image"],
  "SKU": "ABC123",
  "Title": "Wireless Headphones"
}

Context Field Types

Plain Text

"SKU": "ABC123"

Formatted Text

"Description": {
  "type": "text",
  "format": "markdown",
  "text": "This is **bold** text"
}

Image

"Image": {
  "type": "image",
  "url": "https://example.com/image.jpg"
}

Video

"Video": {
  "type": "video",
  "url": "https://example.com/video.mp4"
}

7. Annotations Object (What Gets Classified)

The annotations object defines the categorization output.

The keys inside this object are meaningful. They determine how the platform interprets your data.

Annotation Strategies

Before creating an annotation, ask:

Question Answer
Is the project dynamic? Use name-based keys + __type
Is the project non-dynamic? Use class or group names
Is this external data? Use __placeholder_*

8. Dynamic Annotations (Most Complex Case)

Dynamic annotations are keyed by the taxonomy’s name attribute.

They must include __type.

"product_height": {
  "__type": "dimension_attribute",
  "value": "20",
  "unit": "cm"
}
  • The key (product_height) becomes the name attribute
  • __type identifies the taxonomy class
  • Missing __type causes ingestion to fail

9. Non-Dynamic Annotations

Non-dynamic annotations use fixed taxonomy classes or group labels.

"Content Classification": {
  "confidence": "high"
}

These annotations:

  • Do not require __type
  • May include __mutable if allowed

10. Placeholder Annotations

Placeholders allow you to attach arbitrary data without taxonomy validation.

"__placeholder_external": {
  "legacy_score": 0.92
}

Use placeholders only when taxonomy validation is not desired.

11. Complete Complex Ingestion Example

This example demonstrates all supported structures together.

{
  "primary_keys": ["SKU-98765"],
  "metadata": {
    "source": "product_catalog"
  },
  "items": [
    {
      "context": {
        "__order": ["SKU", "Title", "Image", "Video"],
        "SKU": "SKU-98765",
        "Title": "Wireless Headphones",
        "Image": {
          "type": "image",
          "url": "https://example.com/image.jpg"
        },
        "Video": {
          "type": "video",
          "url": "https://example.com/video.mp4"
        }
      },
      "annotations": {
        "__order": ["product_height", "Content Classification"],
        "product_height": {
          "__type": "dimension_attribute",
          "value": "20",
          "unit": "cm"
        },
        "Content Classification": {
          "confidence": "high"
        },
        "__placeholder_external": {
          "external_score": 0.87
        }
      }
    }
  ]
}
classification data structure

Was this article helpful?

Yes
No
Give feedback about this article
1. Overview 2. The Mental Model (How to Think About Ingestion JSON) 3. Overall Hierarchy and Levels 4. Top-Level Task Object Task Fields primary_keys (optional) metadata (optional) items (required) 5. Items (Units of Work) 6. Context Object (What Annotators See) Display Ordering Context Field Types Plain Text Formatted Text Image Video 7. Annotations Object (What Gets Classified) Annotation Strategies 8. Dynamic Annotations (Most Complex Case) 9. Non-Dynamic Annotations 10. Placeholder Annotations 11. Complete Complex Ingestion Example

The first B Corp-certified AI company

  • Security
  • Terms
  • Privacy
  • Quality & Information

Copyright © 2023 Samasource Impact Sourcing, Inc. All rights reserved.


Knowledge Base Software powered by Helpjuice

Expand