Copied
Docs

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request.
We’ll get back to you as soon as possible.

Please fill out the contact form below and we will reply as soon as possible.

EMPLOYEE LOGIN
  • Home
  • Getting Started
  • Annotate
  • Tasks
  • API
  • Recipes
  • Tutorials
  • Integrations
    • Developer integrations
    • Security integrations

Databricks Connector

Updated at January 24th, 2024

The Sama Databricks Connector enables you to quickly create, monitor, and get annotated tasks right from Databricks. 

Requirements

  • Databricks: Runtime 10.4 LTS or later
  • Sama Platform account

Installation

Install the Sama SDK and Databricks connector using the following command in your Databricks workbook.

%pip install sama
from sama.databricks import Client

Configure the SDK

You need to specify:

  1. Your API Key
  2. Your Sama Project ID

Your project manager will provide you with the correct Project ID(s) and they will have configured all the necessary Sama Project inputs and outputs.

# Set your Sama API KEY
API_KEY: str = ""
# Set your project ID
PROJECT_ID: str = ""

if not(API_KEY):
 raise ValueError("API_KEY not set")
if not(PROJECT_ID):
 raise ValueError("PROJECT_ID not set")

client = Client(API_KEY)
client.get_project_information(PROJECT_ID) # Verify config by calling Get Project Information endpoint. Throws exception if PROJECT_ID or API_KEY not valid.

Usage with Databricks and Spark Dataframes

Once you are set up and properly configured, you can start using functions that accept or return Spark Dataframes:

  • create_task_batch_from_table() - create tasks, using data from a Dataframe, in the Sama Platform to be picked up by the annotators and quality teams.
  • get_delivered_tasks_to_table() or get_delivered_tasks_since_last_call_to_table() - get delivered tasks, into a Dataframe, which have been annotated and reviewed by our quality team.
  • get_multi_task_status_to_table() or get_task_status_to_table() -  get annotation statuses of tasks, into a Dataframe.

Tutorials

Please use our Jupyter Notebook tutorial to create sample tasks from a Dataframe and get delivered tasks into a Dataframe.

 

Python SDK and Databricks Connector Reference

Other functions available in the Python SDK include:

  • Retrieving task and delivery schemas
  • Checking the status of and cancelling batch creation jobs
  • Updating task priorities
  • Rejecting and deleting tasks
  • Obtaining project stats and information

The API reference can be found on GitHub.

integration data link

Was this article helpful?

Yes
No
Give feedback about this article
Requirements Installation Configure the SDK Usage with Databricks and Spark Dataframes Tutorials Python SDK and Databricks Connector Reference The API reference can be found on GitHub.

The first B Corp-certified AI company

  • Security
  • Terms
  • Privacy
  • Quality & Information

Copyright © 2023 Samasource Impact Sourcing, Inc. All rights reserved.


Knowledge Base Software powered by Helpjuice

Expand