Updated at September 21st, 2023
The Sama Databricks Connector enables you to quickly create, monitor, and get annotated tasks right from Databricks.
- Databricks: Runtime 10.4 LTS or later
- Sama Platform account
Install the Sama SDK and Databricks connector using the following command in your Databricks workbook.
%pip install sama from sama.databricks import Client
Configure the SDK
# Set your Sama API KEY API_KEY: str = "" # Set your project ID PROJECT_ID: str = "" if not(API_KEY): raise ValueError("API_KEY not set") if not(PROJECT_ID): raise ValueError("PROJECT_ID not set") client = Client(API_KEY) client.get_project_information(PROJECT_ID) # Verify config by calling Get Project Information endpoint. Throws exception if PROJECT_ID or API_KEY not valid.
Usage with Databricks and Spark Dataframes
Once you are set up and properly configured, you can start using functions that accept or return Spark Dataframes:
create_task_batch_from_table()- create tasks, using data from a Dataframe, in the Sama Platform to be picked up by the annotators and quality teams.
get_delivered_tasks_since_last_call_to_table()- get delivered tasks, into a Dataframe, which have been annotated and reviewed by our quality team.
get_task_status_to_table()- get annotation statuses of tasks, into a Dataframe.
Python SDK and Databricks Connector Reference
Other functions available in the Python SDK include:
- Retrieving task and delivery schemas
- Checking the status of and cancelling batch creation jobs
- Updating task priorities
- Rejecting and deleting tasks
- Obtaining project stats and information
The API reference can be found on GitHub.