Generate images, video, audio, and text with simple API calls. Pay per request — no GPU to manage.
Create an account
Sign up free with email or Google/GitHub. No credit card required.
Add funds
Go to Billing and add credits (from $5). 50% bonus on your first purchase.
Create an API key
Go to Dashboard → API Keys and click "Create API Key." Copy the mp_live_... key — you won't see it again.
Generate your first image
Run this curl command with your API key:
curl -X POST https://modelpilot.ai/api/v1/generate/image \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{"model": "flux-schnell", "prompt": "a red apple on a white table"}'On first request, the worker may need 30-45s to start (cold start). If you get a poll_url in the response, poll it every 10 seconds until status: "completed". Subsequent requests while the worker is warm return results in 1-2 seconds.
| Type | Model | Cost | Speed |
|---|---|---|---|
| Image | flux-schnell | $0.008 | ~20s |
| Image | sdxl | $0.005 | ~15s |
| Image | zimage | $0.008 | ~10s |
| Audio | kokoro | $0.002 | ~5s |
| Video | wan-t2v | $0.30 | ~2min (async) |
| Text | qwen3-8b | $0.01 | ~30s cold start |
Create API keys in your dashboard to access ModelPilot endpoints programmatically. API keys must have proxy permission for OpenAI-compatible endpoints.
curl -X POST https://modelpilot.ai/api/v1/chat/completions \
-H "Authorization: Bearer mp_live_your_api_key_here" \
-H "Content-Type: application/json"read and proxy permissionsCreate chat completions using the OpenAI-compatible format. Automatically routes to your deployed text models.
POST /api/v1/chat/completionsconst response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer mp_live_your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'qwen3-8b',
messages: [
{ role: 'user', content: 'Hello, how are you?' }
],
temperature: 0.7,
max_tokens: 100
})
});
const data = await response.json();
console.log(data.choices[0].message.content);curl -X POST https://modelpilot.ai/api/v1/chat/completions \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 100
}'from openai import OpenAI
client = OpenAI(
api_key="mp_live_your_api_key",
base_url="https://modelpilot.ai/api/v1"
)
response = client.chat.completions.create(
model="qwen3-8b",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)import requests
response = requests.post(
"https://modelpilot.ai/api/v1/chat/completions",
headers={"Authorization": "Bearer mp_live_your_api_key"},
json={
"model": "qwen3-8b",
"messages": [{"role": "user", "content": "Hello"}]
}
)
print(response.json()["choices"][0]["message"]["content"]){
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"created": 1677652288,
"model": "qwen3-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 20,
"total_tokens": 32
},
"system_fingerprint": "modelpilot-pod123",
"x_modelpilot": {
"deployment_id": "pod123",
"model_identifier": "qwen3-8b:7b",
"response_time_ms": 1250,
"direct_endpoint": "https://pod123.proxy.runpod.net:11434"
}
}| Parameter | Type | Description |
|---|---|---|
| model | string | Your deployed model name (e.g., "qwen3-8b", "gemma3") |
| messages | array | Array of message objects with role and content |
| temperature | number | Sampling temperature (0.0 to 2.0) |
| max_tokens | number | Maximum tokens to generate |
| top_p | number | Nucleus sampling parameter |
| stop | string|array | Stop sequences |
| stream | boolean | Stream response as Server-Sent Events |
Set stream: true in your chat completions request to receive responses as Server-Sent Events (SSE). Each event contains a data: line with a JSON chunk, and the stream ends with data: [DONE].
from openai import OpenAI
client = OpenAI(
api_key="mp_live_your_api_key",
base_url="https://modelpilot.ai/api/v1"
)
stream = client.chat.completions.create(
model="qwen3-8b",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print()const response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer mp_live_your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'qwen3-8b',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Each line is "data: {...}" or "data: [DONE]"
console.log(text);
}API requests are rate-limited to protect service stability. Limits are applied per IP address.
| Detail | Value |
|---|---|
| Default limit | 100 requests per minute per IP |
| Exceeded response | 429 Too Many Requests with Retry-After header |
| Note | Limits may vary by endpoint and account type |
If you receive a 429 response, wait for the duration specified in the Retry-After header before retrying. Implement exponential backoff for production integrations.
Check the health status of your deployments to ensure services are running properly.
GET /api/deployments/{podId}/healthcurl -X GET https://modelpilot.ai/api/deployments/pod123/health \
-H "Authorization: Bearer mp_live_your_api_key"{
"status": "healthy",
"timestamp": "2023-12-01T10:30:00.000Z",
"services": {
"ollama": "running",
"webui": "running"
},
"deployment_status": "running",
"response_time_ms": 125,
"last_checked": "2023-12-01T10:30:00.000Z"
}Use the ModelPilot dashboard to deploy your preferred model
Generate an API key with proxy permissions in your dashboard
Change the base URL and API key in your existing OpenAI code
const openai = new OpenAI({
apiKey: 'sk-...',
baseURL: 'https://api.openai.com/v1'
});
const response = await openai.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Hello' }]
});const openai = new OpenAI({
apiKey: 'mp_live_your_api_key',
baseURL: 'https://modelpilot.ai/api/v1'
});
const response = await openai.chat.completions.create({
model: 'qwen3-8b', // Your deployed model
messages: [{ role: 'user', content: 'Hello' }]
});No active deployment found for the specified model. Deploy the model first via the dashboard.
{
"error": {
"message": "No active deployment found for model 'qwen3-8b'",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
},
"available_models": ["gemma3:7b", "deepseek-r1"]
}The deployment exists but is not currently running. Start it via the dashboard.
{
"error": {
"message": "Model 'qwen3-8b' deployment is not running (status: stopped)",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_available"
}
}The model is not a text model and cannot be used with chat completions.
{
"error": {
"message": "Model 'flux-dev' is not a text model and cannot be used with chat completions",
"type": "invalid_request_error",
"param": "model",
"code": "invalid_model_type"
}
}POST /api/v1/generate/imageGenerate images from text prompts. Returns the image synchronously when the worker is warm (10-30s), or a poll_url on cold start.
curl -X POST https://modelpilot.ai/api/v1/generate/image \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "flux-schnell",
"prompt": "a red fox in a snowy forest, photorealistic",
"width": 1024,
"height": 1024
}'| Parameter | Required | Description |
|---|---|---|
| model | Yes | flux-schnell, flux-dev, sdxl, or zimage |
| prompt | Yes | Text description of desired image |
| width | No | Image width (default: 1024) |
| height | No | Image height (default: 1024) |
| negative_prompt | No | What to avoid (SDXL and zimage only) |
| steps | No | Inference steps (default varies) |
| seed | No | Random seed for reproducibility |
{
"id": "gen_abc123",
"model": "flux-schnell",
"images": [{ "base64": "<base64 PNG>", "filename": "output_00001_.png" }],
"cost": 0.008,
"execution_time_ms": 18500
}{
"id": "sync-abc123",
"model": "flux-schnell",
"status": "processing",
"job_id": "sync-abc123",
"endpoint_id": "ep-xxx",
"poll_url": "/api/v1/generate/image/status/sync-abc123?endpoint_id=ep-xxx&model=flux-schnell",
"message": "Cold start in progress. Poll the status URL every 10 seconds.",
"estimated_time_ms": 30000
}On cold start, poll the poll_url until status is "completed". Subsequent requests while the worker is warm return images directly.
POST /api/v1/generate/audioText-to-speech generation. Returns base64 WAV audio when warm (2-5s), or a poll_url on cold start.
curl -X POST https://modelpilot.ai/api/v1/generate/audio \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{"model": "kokoro", "text": "Hello, welcome to ModelPilot."}'| Parameter | Required | Description |
|---|---|---|
| model | Yes | kokoro ($0.002) or chatterbox ($0.005) |
| text | Yes | Text to synthesize (max 5000 chars) |
| voice | No | Voice ID (default: af_heart for kokoro) |
| speed | No | Speed multiplier 0.5-2.0 (default: 1.0) |
{
"id": "gen_abc123",
"model": "kokoro",
"audio": "<base64 WAV>",
"format": "wav",
"sample_rate": 24000,
"cost": 0.002,
"execution_time_ms": 3200
}On cold start, returns {status: "processing", poll_url: "..."} — poll until complete, same as image.
POST /api/v1/generate/video(async)Video generation is asynchronous. Submit a job, then poll for results.
curl -X POST https://modelpilot.ai/api/v1/generate/video \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "wan-t2v",
"prompt": "a sunset over the ocean, cinematic, 4k"
}'{
"id": "vid_abc123",
"model": "wan-t2v",
"status": "processing",
"job_id": "run-abc123",
"endpoint_id": "ep-xxx",
"poll_url": "/api/v1/generate/video/status/run-abc123?endpoint_id=ep-xxx&model=wan-t2v",
"estimated_time_ms": 120000,
"cost": 0.30
}curl https://modelpilot.ai/api/v1/generate/video/status/run-abc123 \
-H "Authorization: Bearer mp_live_your_api_key"{
"status": "COMPLETED",
"videos": [{ "url": "https://..." }],
"execution_time_ms": 95000
}Need help? Check out our full documentation or contact support.