Generate images, video, audio, and text with simple API calls. Pay per request — no GPU to manage.
Generate an image in one API call. Get your API key from Dashboard → API Keys.
curl -X POST https://modelpilot.ai/api/v1/generate/image \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{"model": "flux-schnell", "prompt": "a red apple on a white table"}'| Type | Model | Cost | Speed |
|---|---|---|---|
| Image | flux-schnell | $0.03 | ~20s |
| Image | sdxl | $0.008 | ~15s |
| Image | zimage | $0.03 | ~10s |
| Audio | kokoro | $0.002 | ~5s |
| Video | wan-t2v | $0.30 | ~2min (async) |
| Text | qwen3-8b | $0.01 | ~30s cold start |
Create API keys in your dashboard to access ModelPilot endpoints programmatically. API keys must have proxy permission for OpenAI-compatible endpoints.
curl -X POST https://modelpilot.ai/api/v1/chat/completions \
-H "Authorization: Bearer mp_live_your_api_key_here" \
-H "Content-Type: application/json"read and proxy permissionsCreate chat completions using the OpenAI-compatible format. Automatically routes to your deployed text models.
POST /api/v1/chat/completionsconst response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer mp_live_your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'qwen3-8b',
messages: [
{ role: 'user', content: 'Hello, how are you?' }
],
temperature: 0.7,
max_tokens: 100
})
});
const data = await response.json();
console.log(data.choices[0].message.content);curl -X POST https://modelpilot.ai/api/v1/chat/completions \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 100
}'from openai import OpenAI
client = OpenAI(
api_key="mp_live_your_api_key",
base_url="https://modelpilot.ai/api/v1"
)
response = client.chat.completions.create(
model="qwen3-8b",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)import requests
response = requests.post(
"https://modelpilot.ai/api/v1/chat/completions",
headers={"Authorization": "Bearer mp_live_your_api_key"},
json={
"model": "qwen3-8b",
"messages": [{"role": "user", "content": "Hello"}]
}
)
print(response.json()["choices"][0]["message"]["content"]){
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"created": 1677652288,
"model": "qwen3-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 20,
"total_tokens": 32
},
"system_fingerprint": "modelpilot-pod123",
"x_modelpilot": {
"deployment_id": "pod123",
"model_identifier": "qwen3-8b:7b",
"response_time_ms": 1250,
"direct_endpoint": "https://pod123.proxy.runpod.net:11434"
}
}| Parameter | Type | Description |
|---|---|---|
| model | string | Your deployed model name (e.g., "qwen3-8b", "gemma3") |
| messages | array | Array of message objects with role and content |
| temperature | number | Sampling temperature (0.0 to 2.0) |
| max_tokens | number | Maximum tokens to generate |
| top_p | number | Nucleus sampling parameter |
| stop | string|array | Stop sequences |
| stream | boolean | Stream response as Server-Sent Events |
Set stream: true in your chat completions request to receive responses as Server-Sent Events (SSE). Each event contains a data: line with a JSON chunk, and the stream ends with data: [DONE].
from openai import OpenAI
client = OpenAI(
api_key="mp_live_your_api_key",
base_url="https://modelpilot.ai/api/v1"
)
stream = client.chat.completions.create(
model="qwen3-8b",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print()const response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer mp_live_your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'qwen3-8b',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Each line is "data: {...}" or "data: [DONE]"
console.log(text);
}API requests are rate-limited to protect service stability. Limits are applied per IP address.
| Detail | Value |
|---|---|
| Default limit | 100 requests per minute per IP |
| Exceeded response | 429 Too Many Requests with Retry-After header |
| Note | Limits may vary by endpoint and account type |
If you receive a 429 response, wait for the duration specified in the Retry-After header before retrying. Implement exponential backoff for production integrations.
Check the health status of your deployments to ensure services are running properly.
GET /api/deployments/{podId}/healthcurl -X GET https://modelpilot.ai/api/deployments/pod123/health \
-H "Authorization: Bearer mp_live_your_api_key"{
"status": "healthy",
"timestamp": "2023-12-01T10:30:00.000Z",
"services": {
"ollama": "running",
"webui": "running"
},
"deployment_status": "running",
"response_time_ms": 125,
"last_checked": "2023-12-01T10:30:00.000Z"
}Use the ModelPilot dashboard to deploy your preferred model
Generate an API key with proxy permissions in your dashboard
Change the base URL and API key in your existing OpenAI code
const openai = new OpenAI({
apiKey: 'sk-...',
baseURL: 'https://api.openai.com/v1'
});
const response = await openai.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Hello' }]
});const openai = new OpenAI({
apiKey: 'mp_live_your_api_key',
baseURL: 'https://modelpilot.ai/api/v1'
});
const response = await openai.chat.completions.create({
model: 'qwen3-8b', // Your deployed model
messages: [{ role: 'user', content: 'Hello' }]
});No active deployment found for the specified model. Deploy the model first via the dashboard.
{
"error": {
"message": "No active deployment found for model 'qwen3-8b'",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_found"
},
"available_models": ["gemma3:7b", "deepseek-r1"]
}The deployment exists but is not currently running. Start it via the dashboard.
{
"error": {
"message": "Model 'qwen3-8b' deployment is not running (status: stopped)",
"type": "invalid_request_error",
"param": "model",
"code": "model_not_available"
}
}The model is not a text model and cannot be used with chat completions.
{
"error": {
"message": "Model 'flux-dev' is not a text model and cannot be used with chat completions",
"type": "invalid_request_error",
"param": "model",
"code": "invalid_model_type"
}
}POST /api/v1/generate/imageGenerate images from text prompts. Returns synchronously (typically 10-30s).
curl -X POST https://modelpilot.ai/api/v1/generate/image \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "flux-schnell",
"prompt": "a red fox in a snowy forest, photorealistic",
"width": 1024,
"height": 1024
}'| Parameter | Required | Description |
|---|---|---|
| model | Yes | flux-schnell, flux-dev, sdxl, or zimage |
| prompt | Yes | Text description of desired image |
| width | No | Image width (default: 1024) |
| height | No | Image height (default: 1024) |
| negative_prompt | No | What to avoid (SDXL and zimage only) |
| steps | No | Inference steps (default varies) |
| seed | No | Random seed for reproducibility |
{
"id": "img_abc123",
"model": "flux-schnell",
"images": [{ "url": "https://..." }],
"cost": 0.03,
"execution_time_ms": 18500
}POST /api/v1/generate/audioText-to-speech generation. Returns base64-encoded WAV audio synchronously.
curl -X POST https://modelpilot.ai/api/v1/generate/audio \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{"model": "kokoro", "text": "Hello, welcome to ModelPilot."}'| Parameter | Required | Description |
|---|---|---|
| model | Yes | kokoro ($0.002) or chatterbox ($0.005) |
| text | Yes | Text to synthesize (max 5000 chars) |
| voice | No | Voice ID (default: af_heart for kokoro) |
| speed | No | Speed multiplier 0.5-2.0 (default: 1.0) |
{
"id": "audio_abc123",
"model": "kokoro",
"audio": "<base64-encoded WAV>",
"format": "wav",
"sample_rate": 24000,
"cost": 0.002,
"execution_time_ms": 3200
}POST /api/v1/generate/video(async)Video generation is asynchronous. Submit a job, then poll for results.
curl -X POST https://modelpilot.ai/api/v1/generate/video \
-H "Authorization: Bearer mp_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "wan-t2v",
"prompt": "a sunset over the ocean, cinematic, 4k"
}'{
"id": "vid_abc123",
"model": "wan-t2v",
"status": "processing",
"job_id": "run-abc123",
"poll_url": "/api/v1/generate/video/status/run-abc123",
"estimated_time_ms": 120000,
"cost": 0.30
}curl https://modelpilot.ai/api/v1/generate/video/status/run-abc123 \
-H "Authorization: Bearer mp_live_your_api_key"{
"status": "COMPLETED",
"videos": [{ "url": "https://..." }],
"execution_time_ms": 95000
}Need help? Check out our full documentation or contact support.