Skip to main content

Step 1: Install Required Libraries

pip install requests

Step 2: Set Up API Configuration

import os
import requests
import json

BASE_URL = "https://orch.zenbase.ai/api"
API_KEY = "YOUR API KEY"

def api_call(method, endpoint, data=None, files=None):
    url = f"{BASE_URL}/{endpoint}"
    headers = {"Authorization": f"Api-Key {API_KEY}"}
    response = requests.request(method, url, headers=headers, data=data, files=files)
    return response

Step 3: Create a Dataset

First, create a dataset where you’ll add your items:
dataset_data = {
    "name": "My Bulk Dataset",
    "description": "Dataset for bulk creation example",
    "embedding_type": "OPENAI",  # Optional: if you want to use embeddings
    "embedding_api_key": "YOUR_OPENAI_API_KEY",  # Optional: if using embeddings
    "model_name": "text-embedding-3-small"  # Optional: if using embeddings
}

dataset = api_call("POST", "datasets/", dataset_data)
dataset_id = dataset.json()['id']
print(f"Created dataset with ID: {dataset_id}")

Step 4: Prepare Your Bulk Data

Create a JSON file containing an array of items you want to add to your dataset. Each item should include the dataset ID, inputs, and outputs:
bulk_data = [
    {
        "dataset": dataset_id,
        "inputs": {
            "text": "This movie was fantastic!",
            "rating": 5
        },
        "outputs": {
            "sentiment": "positive"
        }
    },
    {
        "dataset": dataset_id,
        "inputs": {
            "text": "I didn't enjoy this film at all.",
            "rating": 1
        },
        "outputs": {
            "sentiment": "negative"
        }
    },
    {
        "dataset": dataset_id,
        "inputs": {
            "text": "The movie was okay, nothing special.",
            "rating": 3
        },
        "outputs": {
            "sentiment": "neutral"
        }
    }
]

# Save bulk data to a file
bulk_file_path = "./bulk_items.json"
with open(bulk_file_path, 'w') as f:
    json.dump(bulk_data, f, indent=2)

Step 5: Submit the Bulk Create Request

Submit your bulk create request with the data file:
# Create a dictionary with the uploaded file opened in binary mode
files = {"file": open(bulk_file_path, "rb")}

# Set the mode for bulk create (optional)
data = {"mode": "upsert"}  # Options: "insert" or "upsert" or "replace"

# Submit the bulk create request
bulk_create = api_call(
    "POST",
    f"datasets/{dataset_id}/bulk_create/",
    data=data,
    files=files
)

# Get the bulk create task ID
bulk_create_task_id = bulk_create.json()['bulk_create_task_id']
print(f"Created bulk create task with ID: {bulk_create_task_id}")

Step 6: Monitor Bulk Create Status

You can check the status of your bulk create task in the admin panel. The task will process your items in the background and add them to your dataset. The bulk create feature supports two modes:
  • insert: Creates new items only
  • upsert: Creates new items or updates existing ones if they match
  • replace: Deletes existing items and creates new ones
This feature is particularly useful when you need to:
  • Import large datasets from external sources
  • Update multiple dataset items simultaneously
  • Populate datasets with training data
Remember to structure your bulk data according to your dataset’s schema, ensuring that all required fields are included for each item.