Skip to main content

ModelPilot API Documentation

Generate images, video, audio, and text with simple API calls. Pay per request — no GPU to manage.

Getting Started

1

Create an account

Sign up free with email or Google/GitHub. No credit card required.

2

Add funds

Go to Billing and add credits (from $5). 50% bonus on your first purchase.

3

Create an API key

Go to Dashboard → API Keys and click "Create API Key." Copy the mp_live_... key — you won't see it again.

4

Generate your first image

Run this curl command with your API key:

bash
curl -X POST https://modelpilot.ai/api/v1/generate/image \
  -H "Authorization: Bearer mp_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "flux-schnell", "prompt": "a red apple on a white table"}'

On first request, the worker may need 30-45s to start (cold start). If you get a poll_url in the response, poll it every 10 seconds until status: "completed". Subsequent requests while the worker is warm return results in 1-2 seconds.

Per-Request Pricing

TypeModelCostSpeed
Imageflux-schnell$0.008~20s
Imagesdxl$0.005~15s
Imagezimage$0.008~10s
Audiokokoro$0.002~5s
Videowan-t2v$0.30~2min (async)
Textqwen3-8b$0.01~30s cold start

Authentication

API Keys

Create API keys in your dashboard to access ModelPilot endpoints programmatically. API keys must have proxy permission for OpenAI-compatible endpoints.

bash
curl -X POST https://modelpilot.ai/api/v1/chat/completions \
  -H "Authorization: Bearer mp_live_your_api_key_here" \
  -H "Content-Type: application/json"

API Key Requirements

  • • Requires read and proxy permissions
  • • Session authentication (web UI) has full permissions automatically
  • • API keys can be created and managed in your dashboard

OpenAI-Compatible Endpoints

Chat Completions

Create chat completions using the OpenAI-compatible format. Automatically routes to your deployed text models.

POST /api/v1/chat/completions

Request Example

JavaScript (fetch)

javascript
const response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer mp_live_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'qwen3-8b',
    messages: [
      { role: 'user', content: 'Hello, how are you?' }
    ],
    temperature: 0.7,
    max_tokens: 100
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

cURL

bash
curl -X POST https://modelpilot.ai/api/v1/chat/completions \
  -H "Authorization: Bearer mp_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

Python (openai SDK)

python
from openai import OpenAI

client = OpenAI(
    api_key="mp_live_your_api_key",
    base_url="https://modelpilot.ai/api/v1"
)

response = client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

Python (requests)

python
import requests

response = requests.post(
    "https://modelpilot.ai/api/v1/chat/completions",
    headers={"Authorization": "Bearer mp_live_your_api_key"},
    json={
        "model": "qwen3-8b",
        "messages": [{"role": "user", "content": "Hello"}]
    }
)
print(response.json()["choices"][0]["message"]["content"])

Response Example

json
{
  "id": "chatcmpl-1234567890",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "qwen3-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 20,
    "total_tokens": 32
  },
  "system_fingerprint": "modelpilot-pod123",
  "x_modelpilot": {
    "deployment_id": "pod123",
    "model_identifier": "qwen3-8b:7b",
    "response_time_ms": 1250,
    "direct_endpoint": "https://pod123.proxy.runpod.net:11434"
  }
}

Supported Parameters

ParameterTypeDescription
modelstringYour deployed model name (e.g., "qwen3-8b", "gemma3")
messagesarrayArray of message objects with role and content
temperaturenumberSampling temperature (0.0 to 2.0)
max_tokensnumberMaximum tokens to generate
top_pnumberNucleus sampling parameter
stopstring|arrayStop sequences
streambooleanStream response as Server-Sent Events

Streaming

Set stream: true in your chat completions request to receive responses as Server-Sent Events (SSE). Each event contains a data: line with a JSON chunk, and the stream ends with data: [DONE].

Python (openai SDK)

python
from openai import OpenAI

client = OpenAI(
    api_key="mp_live_your_api_key",
    base_url="https://modelpilot.ai/api/v1"
)

stream = client.chat.completions.create(
    model="qwen3-8b",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
print()

JavaScript (fetch)

javascript
const response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer mp_live_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'qwen3-8b',
    messages: [{ role: 'user', content: 'Tell me a story' }],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  // Each line is "data: {...}" or "data: [DONE]"
  console.log(text);
}

Rate Limits

API requests are rate-limited to protect service stability. Limits are applied per IP address.

DetailValue
Default limit100 requests per minute per IP
Exceeded response429 Too Many Requests with Retry-After header
NoteLimits may vary by endpoint and account type

If you receive a 429 response, wait for the duration specified in the Retry-After header before retrying. Implement exponential backoff for production integrations.

Health Monitoring

Deployment Health

Check the health status of your deployments to ensure services are running properly.

GET /api/deployments/{podId}/health

cURL Example

bash
curl -X GET https://modelpilot.ai/api/deployments/pod123/health \
  -H "Authorization: Bearer mp_live_your_api_key"

Response Example

json
{
  "status": "healthy",
  "timestamp": "2023-12-01T10:30:00.000Z",
  "services": {
    "ollama": "running",
    "webui": "running"
  },
  "deployment_status": "running",
  "response_time_ms": 125,
  "last_checked": "2023-12-01T10:30:00.000Z"
}

Status Values

● healthy - All services running
● degraded - Some services have issues
● unhealthy - Services are down
● starting - Deployment is starting up
● unknown - Status could not be determined

Migration from OpenAI

Quick Migration Steps

1

Deploy Your Model

Use the ModelPilot dashboard to deploy your preferred model

2

Create API Key

Generate an API key with proxy permissions in your dashboard

3

Update Your Code

Change the base URL and API key in your existing OpenAI code

Code Changes

Before (OpenAI):
javascript
const openai = new OpenAI({
  apiKey: 'sk-...',
  baseURL: 'https://api.openai.com/v1'
});

const response = await openai.chat.completions.create({
  model: 'gpt-3.5-turbo',
  messages: [{ role: 'user', content: 'Hello' }]
});
After (ModelPilot):
javascript
const openai = new OpenAI({
  apiKey: 'mp_live_your_api_key',
  baseURL: 'https://modelpilot.ai/api/v1'
});

const response = await openai.chat.completions.create({
  model: 'qwen3-8b',  // Your deployed model
  messages: [{ role: 'user', content: 'Hello' }]
});

Error Handling

Common Errors

Model Not Found (404)

No active deployment found for the specified model. Deploy the model first via the dashboard.

json
{
  "error": {
    "message": "No active deployment found for model 'qwen3-8b'",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  },
  "available_models": ["gemma3:7b", "deepseek-r1"]
}

Model Not Running (503)

The deployment exists but is not currently running. Start it via the dashboard.

json
{
  "error": {
    "message": "Model 'qwen3-8b' deployment is not running (status: stopped)",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_available"
  }
}

Invalid Model Type (400)

The model is not a text model and cannot be used with chat completions.

json
{
  "error": {
    "message": "Model 'flux-dev' is not a text model and cannot be used with chat completions",
    "type": "invalid_request_error",
    "param": "model",
    "code": "invalid_model_type"
  }
}

Best Practices

Performance Tips

  • • Keep deployments running for faster response times
  • • Use appropriate temperature values (0.1-0.9 for most use cases)
  • • Set reasonable max_tokens to control costs
  • • Monitor deployment health regularly
  • • Consider using direct endpoints for better performance

Cost Optimization

  • • Stop deployments when not in use
  • • Use smaller models for simple tasks
  • • Monitor your credit usage in the dashboard
  • • Set up usage alerts and limits
  • • Consider batch processing for efficiency

Image Generation

POST /api/v1/generate/image

Generate images from text prompts. Returns the image synchronously when the worker is warm (10-30s), or a poll_url on cold start.

cURL

bash
curl -X POST https://modelpilot.ai/api/v1/generate/image \
  -H "Authorization: Bearer mp_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "flux-schnell",
    "prompt": "a red fox in a snowy forest, photorealistic",
    "width": 1024,
    "height": 1024
  }'

Parameters

ParameterRequiredDescription
modelYesflux-schnell, flux-dev, sdxl, or zimage
promptYesText description of desired image
widthNoImage width (default: 1024)
heightNoImage height (default: 1024)
negative_promptNoWhat to avoid (SDXL and zimage only)
stepsNoInference steps (default varies)
seedNoRandom seed for reproducibility

Response (warm)

json
{
  "id": "gen_abc123",
  "model": "flux-schnell",
  "images": [{ "base64": "<base64 PNG>", "filename": "output_00001_.png" }],
  "cost": 0.008,
  "execution_time_ms": 18500
}

Response (cold start)

json
{
  "id": "sync-abc123",
  "model": "flux-schnell",
  "status": "processing",
  "job_id": "sync-abc123",
  "endpoint_id": "ep-xxx",
  "poll_url": "/api/v1/generate/image/status/sync-abc123?endpoint_id=ep-xxx&model=flux-schnell",
  "message": "Cold start in progress. Poll the status URL every 10 seconds.",
  "estimated_time_ms": 30000
}

On cold start, poll the poll_url until status is "completed". Subsequent requests while the worker is warm return images directly.

Audio Generation (TTS)

POST /api/v1/generate/audio

Text-to-speech generation. Returns base64 WAV audio when warm (2-5s), or a poll_url on cold start.

cURL

bash
curl -X POST https://modelpilot.ai/api/v1/generate/audio \
  -H "Authorization: Bearer mp_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "kokoro", "text": "Hello, welcome to ModelPilot."}'

Parameters

ParameterRequiredDescription
modelYeskokoro ($0.002) or chatterbox ($0.005)
textYesText to synthesize (max 5000 chars)
voiceNoVoice ID (default: af_heart for kokoro)
speedNoSpeed multiplier 0.5-2.0 (default: 1.0)

Response (warm)

json
{
  "id": "gen_abc123",
  "model": "kokoro",
  "audio": "<base64 WAV>",
  "format": "wav",
  "sample_rate": 24000,
  "cost": 0.002,
  "execution_time_ms": 3200
}

On cold start, returns {status: "processing", poll_url: "..."} — poll until complete, same as image.

Video Generation

POST /api/v1/generate/video(async)

Video generation is asynchronous. Submit a job, then poll for results.

1. Submit job

bash
curl -X POST https://modelpilot.ai/api/v1/generate/video \
  -H "Authorization: Bearer mp_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan-t2v",
    "prompt": "a sunset over the ocean, cinematic, 4k"
  }'

Submit response

json
{
  "id": "vid_abc123",
  "model": "wan-t2v",
  "status": "processing",
  "job_id": "run-abc123",
  "endpoint_id": "ep-xxx",
  "poll_url": "/api/v1/generate/video/status/run-abc123?endpoint_id=ep-xxx&model=wan-t2v",
  "estimated_time_ms": 120000,
  "cost": 0.30
}

2. Poll for results

bash
curl https://modelpilot.ai/api/v1/generate/video/status/run-abc123 \
  -H "Authorization: Bearer mp_live_your_api_key"

Completed response

json
{
  "status": "COMPLETED",
  "videos": [{ "url": "https://..." }],
  "execution_time_ms": 95000
}