ModelPilot API Documentation

ModelPilot provides OpenAI-compatible API endpoints for seamless migration from OpenAI services to your own deployments. Access your deployed models through familiar OpenAI API patterns.

Authentication

API Keys

Create API keys in your dashboard to access ModelPilot endpoints programmatically. API keys must have proxy permission for OpenAI-compatible endpoints.

bash

curl -X POST https://modelpilot.ai/api/v1/chat/completions \
  -H "Authorization: Bearer mp_live_your_api_key_here" \
  -H "Content-Type: application/json"

API Key Requirements

• Requires read and proxy permissions
• Session authentication (web UI) has full permissions automatically
• API keys can be created and managed in your dashboard

OpenAI-Compatible Endpoints

Chat Completions

Create chat completions using the OpenAI-compatible format. Automatically routes to your deployed text models.

POST /api/v1/chat/completions

Request Example

JavaScript (fetch)

javascript

const response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer mp_live_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'mistral',  // Your deployed model name
    messages: [
      { role: 'user', content: 'Hello, how are you?' }
    ],
    temperature: 0.7,
    max_tokens: 100
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

cURL

bash

curl -X POST https://modelpilot.ai/api/v1/chat/completions \
  -H "Authorization: Bearer mp_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

Python (openai SDK)

python

from openai import OpenAI

client = OpenAI(
    api_key="mp_live_your_api_key",
    base_url="https://modelpilot.ai/api/v1"
)

response = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

Python (requests)

python

import requests

response = requests.post(
    "https://modelpilot.ai/api/v1/chat/completions",
    headers={"Authorization": "Bearer mp_live_your_api_key"},
    json={
        "model": "mistral",
        "messages": [{"role": "user", "content": "Hello"}]
    }
)
print(response.json()["choices"][0]["message"]["content"])

Response Example

json

{
  "id": "chatcmpl-1234567890",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "mistral",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 20,
    "total_tokens": 32
  },
  "system_fingerprint": "modelpilot-pod123",
  "x_modelpilot": {
    "deployment_id": "pod123",
    "model_identifier": "mistral:7b",
    "response_time_ms": 1250,
    "direct_endpoint": "https://pod123.proxy.runpod.net:11434"
  }
}

Supported Parameters

Parameter	Type	Description
model	string	Your deployed model name (e.g., "mistral", "gemma3")
messages	array	Array of message objects with role and content
temperature	number	Sampling temperature (0.0 to 2.0)
max_tokens	number	Maximum tokens to generate
top_p	number	Nucleus sampling parameter
stop	string\|array	Stop sequences
stream	boolean	Stream response as Server-Sent Events

Streaming

Set stream: true in your chat completions request to receive responses as Server-Sent Events (SSE). Each event contains a data: line with a JSON chunk, and the stream ends with data: [DONE].

Python (openai SDK)

python

from openai import OpenAI

client = OpenAI(
    api_key="mp_live_your_api_key",
    base_url="https://modelpilot.ai/api/v1"
)

stream = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
print()

JavaScript (fetch)

javascript

const response = await fetch('https://modelpilot.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer mp_live_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'mistral',
    messages: [{ role: 'user', content: 'Tell me a story' }],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  // Each line is "data: {...}" or "data: [DONE]"
  console.log(text);
}

Rate Limits

API requests are rate-limited to protect service stability. Limits are applied per IP address.

Detail	Value
Default limit	100 requests per minute per IP
Exceeded response	`429 Too Many Requests` with `Retry-After` header
Note	Limits may vary by endpoint and account type

If you receive a 429 response, wait for the duration specified in the Retry-After header before retrying. Implement exponential backoff for production integrations.

Health Monitoring

Deployment Health

Check the health status of your deployments to ensure services are running properly.

GET /api/deployments/{podId}/health

cURL Example

bash

curl -X GET https://modelpilot.ai/api/deployments/pod123/health \
  -H "Authorization: Bearer mp_live_your_api_key"

Response Example

json

{
  "status": "healthy",
  "timestamp": "2023-12-01T10:30:00.000Z",
  "services": {
    "ollama": "running",
    "webui": "running"
  },
  "deployment_status": "running",
  "response_time_ms": 125,
  "last_checked": "2023-12-01T10:30:00.000Z"
}

Status Values

● healthy - All services running

● degraded - Some services have issues

● unhealthy - Services are down

● starting - Deployment is starting up

● unknown - Status could not be determined

Migration from OpenAI

Quick Migration Steps

Deploy Your Model

Use the ModelPilot dashboard to deploy your preferred model

Create API Key

Generate an API key with proxy permissions in your dashboard

Update Your Code

Change the base URL and API key in your existing OpenAI code

Code Changes

Before (OpenAI):

javascript

const openai = new OpenAI({
  apiKey: 'sk-...',
  baseURL: 'https://api.openai.com/v1'
});

const response = await openai.chat.completions.create({
  model: 'gpt-3.5-turbo',
  messages: [{ role: 'user', content: 'Hello' }]
});

After (ModelPilot):

javascript

const openai = new OpenAI({
  apiKey: 'mp_live_your_api_key',
  baseURL: 'https://modelpilot.ai/api/v1'
});

const response = await openai.chat.completions.create({
  model: 'mistral',  // Your deployed model
  messages: [{ role: 'user', content: 'Hello' }]
});

Error Handling

Common Errors

Model Not Found (404)

No active deployment found for the specified model. Deploy the model first via the dashboard.

json

{
  "error": {
    "message": "No active deployment found for model 'mistral'",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  },
  "available_models": ["gemma3:7b", "deepseek-r1"]
}

Model Not Running (503)

The deployment exists but is not currently running. Start it via the dashboard.

json

{
  "error": {
    "message": "Model 'mistral' deployment is not running (status: stopped)",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_available"
  }
}

Invalid Model Type (400)

The model is not a text model and cannot be used with chat completions.

json

{
  "error": {
    "message": "Model 'flux-dev' is not a text model and cannot be used with chat completions",
    "type": "invalid_request_error",
    "param": "model",
    "code": "invalid_model_type"
  }
}

Best Practices

Performance Tips

• Keep deployments running for faster response times
• Use appropriate temperature values (0.1-0.9 for most use cases)
• Set reasonable max_tokens to control costs
• Monitor deployment health regularly
• Consider using direct endpoints for better performance

Cost Optimization

• Stop deployments when not in use
• Use smaller models for simple tasks
• Monitor your credit usage in the dashboard
• Set up usage alerts and limits
• Consider batch processing for efficiency

Need help? Check out our full documentation or contact support.

OpenAI API Reference