Bulk Create for Datasets

Step 1: Install Required Libraries

pip install requests

Step 2: Set Up API Configuration

import os
import requests
import json

BASE_URL = "https://orch.zenbase.ai/api"
API_KEY = "YOUR API KEY"

def api_call(method, endpoint, data=None, files=None):
    url = f"{BASE_URL}/{endpoint}"
    headers = {"Authorization": f"Api-Key {API_KEY}"}
    response = requests.request(method, url, headers=headers, data=data, files=files)
    return response

Step 3: Create a Dataset

First, create a dataset where you’ll add your items:

dataset_data = {
    "name": "My Bulk Dataset",
    "description": "Dataset for bulk creation example",
    "embedding_type": "OPENAI",  # Optional: if you want to use embeddings
    "embedding_api_key": "YOUR_OPENAI_API_KEY",  # Optional: if using embeddings
    "model_name": "text-embedding-3-small"  # Optional: if using embeddings
}

dataset = api_call("POST", "datasets/", dataset_data)
dataset_id = dataset.json()['id']
print(f"Created dataset with ID: {dataset_id}")

Step 4: Prepare Your Bulk Data

Create a JSON file containing an array of items you want to add to your dataset. Each item should include the dataset ID, inputs, and outputs:

bulk_data = [
    {
        "dataset": dataset_id,
        "inputs": {
            "text": "This movie was fantastic!",
            "rating": 5
        },
        "outputs": {
            "sentiment": "positive"
        }
    },
    {
        "dataset": dataset_id,
        "inputs": {
            "text": "I didn't enjoy this film at all.",
            "rating": 1
        },
        "outputs": {
            "sentiment": "negative"
        }
    },
    {
        "dataset": dataset_id,
        "inputs": {
            "text": "The movie was okay, nothing special.",
            "rating": 3
        },
        "outputs": {
            "sentiment": "neutral"
        }
    }
]

# Save bulk data to a file
bulk_file_path = "./bulk_items.json"
with open(bulk_file_path, 'w') as f:
    json.dump(bulk_data, f, indent=2)

Step 5: Submit the Bulk Create Request

Submit your bulk create request with the data file:

# Create a dictionary with the uploaded file opened in binary mode
files = {"file": open(bulk_file_path, "rb")}

# Set the mode for bulk create (optional)
data = {"mode": "upsert"}  # Options: "insert" or "upsert" or "replace"

# Submit the bulk create request
bulk_create = api_call(
    "POST",
    f"datasets/{dataset_id}/bulk_create/",
    data=data,
    files=files
)

# Get the bulk create task ID
bulk_create_task_id = bulk_create.json()['bulk_create_task_id']
print(f"Created bulk create task with ID: {bulk_create_task_id}")

Step 6: Monitor Bulk Create Status

You can check the status of your bulk create task in the admin panel. The task will process your items in the background and add them to your dataset. The bulk create feature supports two modes:

insert: Creates new items only
upsert: Creates new items or updates existing ones if they match
replace: Deletes existing items and creates new ones

This feature is particularly useful when you need to:

Import large datasets from external sources
Update multiple dataset items simultaneously
Populate datasets with training data

Remember to structure your bulk data according to your dataset’s schema, ensuring that all required fields are included for each item.

Get Started

Tutorials

Concepts

Bulk Create for Datasets

Step 1: Install Required Libraries

Step 2: Set Up API Configuration

Step 3: Create a Dataset

Step 4: Prepare Your Bulk Data

Step 5: Submit the Bulk Create Request

Step 6: Monitor Bulk Create Status

Get Started

Tutorials

Concepts

​Step 1: Install Required Libraries

​Step 2: Set Up API Configuration

​Step 3: Create a Dataset

​Step 4: Prepare Your Bulk Data

​Step 5: Submit the Bulk Create Request

​Step 6: Monitor Bulk Create Status

Step 1: Install Required Libraries

Step 2: Set Up API Configuration

Step 3: Create a Dataset

Step 4: Prepare Your Bulk Data

Step 5: Submit the Bulk Create Request

Step 6: Monitor Bulk Create Status