Complete reference for the Iris Flower Classification REST API.
Development: http://localhost:8000
Production: https://your-domain.com
Currently, the API is open. For production deployments, configure API key authentication:
# Set in .env
API_KEY_ENABLED=true
API_KEY=your-secure-api-key
Include in requests:
curl -H "X-API-Key: your-secure-api-key" http://localhost:8000/predict
Root endpoint with API information.
Response:
{
"name": "Iris Flower Classification API",
"version": "2.0.0",
"documentation": "/docs",
"health": "/health",
"metrics": "/metrics"
}
Health check endpoint.
Response:
{
"status": "healthy",
"version": "2.0.0",
"timestamp": "2024-01-15T10:30:00Z",
"models_loaded": 3,
"uptime_seconds": 3600.5
}
Prometheus metrics endpoint.
Returns metrics in Prometheus format for monitoring.
List all available models.
Response:
[
{
"name": "random_forest",
"description": "Random Forest Classifier",
"loaded": true,
"parameters": {
"description": "Random Forest Classifier",
"accuracy": 0.9778,
"loaded_at": "2024-01-15T10:00:00Z",
"train_samples": 105,
"test_samples": 45
}
},
...
]
Get information about a specific model.
Parameters:
model_name (path): Name of the modelResponse:
{
"name": "random_forest",
"description": "Random Forest Classifier",
"loaded": true,
"parameters": {
"accuracy": 0.9778,
"loaded_at": "2024-01-15T10:00:00Z"
}
}
Error Response (404):
{
"detail": "Model random_forest_v2 not found"
}
Load a model into memory.
Parameters:
model_name (path): Name of the model to loadResponse:
{
"message": "Model random_forest loaded successfully",
"metadata": {
"description": "Random Forest Classifier",
"accuracy": 0.9778,
"loaded_at": "2024-01-15T10:30:00Z",
"train_samples": 105,
"test_samples": 45
}
}
Make a prediction for a single iris sample.
Request Body:
{
"sample": {
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
},
"model_name": "random_forest",
"include_probabilities": true
}
Field Validations:
sepal_length: 0.0 to 10.0 cm (must be positive)sepal_width: 0.0 to 10.0 cm (must be positive)petal_length: 0.0 to 10.0 cm (must be positive)petal_width: 0.0 to 10.0 cm (must be positive)model_name: One of the available models (default: “random_forest”)include_probabilities: boolean (default: true)Response:
{
"prediction": "setosa",
"probabilities": {
"setosa": 1.0,
"versicolor": 0.0,
"virginica": 0.0
},
"model_name": "random_forest",
"timestamp": "2024-01-15T10:30:00Z",
"duration_ms": 2.5
}
cURL Example:
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"sample": {
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
},
"model_name": "random_forest",
"include_probabilities": true
}'
Python Example:
import requests
url = "http://localhost:8000/predict"
data = {
"sample": {
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
},
"model_name": "random_forest",
"include_probabilities": True
}
response = requests.post(url, json=data)
print(response.json())
Make predictions for multiple samples.
Request Body:
{
"samples": [
{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
},
{
"sepal_length": 6.7,
"sepal_width": 3.0,
"petal_length": 5.2,
"petal_width": 2.3
}
],
"model_name": "random_forest",
"include_probabilities": true
}
Constraints:
Response:
{
"predictions": [
{
"prediction": "setosa",
"probabilities": {
"setosa": 1.0,
"versicolor": 0.0,
"virginica": 0.0
},
"model_name": "random_forest",
"timestamp": "2024-01-15T10:30:00Z",
"duration_ms": 1.2
},
{
"prediction": "virginica",
"probabilities": {
"setosa": 0.0,
"versicolor": 0.1,
"virginica": 0.9
},
"model_name": "random_forest",
"timestamp": "2024-01-15T10:30:00Z",
"duration_ms": 1.1
}
],
"total_samples": 2,
"total_duration_ms": 5.3
}
Invalid input parameters or validation errors.
{
"error": "Invalid Input",
"detail": "All measurements must be positive values",
"timestamp": "2024-01-15T10:30:00Z"
}
Resource not found (e.g., model not found).
{
"error": "Model Not Found",
"detail": "Model invalid_model not loaded",
"timestamp": "2024-01-15T10:30:00Z"
}
Server error during processing.
{
"detail": "Prediction failed: <error message>"
}
API implements rate limiting to prevent abuse:
RATE_LIMIT_CALLS and RATE_LIMIT_PERIODRate limit headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1234567890
When rate limited (429 status):
{
"error": "Too Many Requests",
"detail": "Rate limit exceeded. Please retry after 60 seconds"
}
All responses include:
X-Process-Time: Request processing time in secondsContent-Type: application/jsonVisit these URLs when the API is running:
With default configuration (4 workers):
For multiple samples, use /predict/batch instead of multiple /predict calls:
❌ Inefficient:
for sample in samples:
response = requests.post(f"{url}/predict", json={"sample": sample})
✅ Efficient:
response = requests.post(
f"{url}/predict/batch",
json={"samples": samples}
)
Load models once and reuse:
# Load model once
requests.post(f"{url}/models/random_forest/load")
# Make multiple predictions
for sample in samples:
response = requests.post(f"{url}/predict", json={
"sample": sample,
"model_name": "random_forest"
})
try:
response = requests.post(url, json=data, timeout=5)
response.raise_for_status()
result = response.json()
except requests.exceptions.Timeout:
print("Request timed out")
except requests.exceptions.HTTPError as e:
print(f"HTTP error: {e}")
except Exception as e:
print(f"Error: {e}")
For high-throughput applications:
import requests
session = requests.Session()
adapter = requests.adapters.HTTPAdapter(
pool_connections=10,
pool_maxsize=20
)
session.mount('http://', adapter)
# Make requests with session
response = session.post(url, json=data)
import requests
from typing import Dict, List
class IrisClassifierClient:
def __init__(self, base_url: str = "http://localhost:8000"):
self.base_url = base_url
self.session = requests.Session()
def predict(self, sample: Dict, model_name: str = "random_forest") -> Dict:
"""Make a single prediction."""
response = self.session.post(
f"{self.base_url}/predict",
json={
"sample": sample,
"model_name": model_name,
"include_probabilities": True
}
)
response.raise_for_status()
return response.json()
def predict_batch(self, samples: List[Dict], model_name: str = "random_forest") -> Dict:
"""Make batch predictions."""
response = self.session.post(
f"{self.base_url}/predict/batch",
json={
"samples": samples,
"model_name": model_name,
"include_probabilities": True
}
)
response.raise_for_status()
return response.json()
def get_models(self) -> List[Dict]:
"""Get available models."""
response = self.session.get(f"{self.base_url}/models")
response.raise_for_status()
return response.json()
# Usage
client = IrisClassifierClient()
sample = {
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}
result = client.predict(sample)
print(f"Prediction: {result['prediction']}")
const axios = require('axios');
const baseURL = 'http://localhost:8000';
async function predict(sample, modelName = 'random_forest') {
try {
const response = await axios.post(`${baseURL}/predict`, {
sample: sample,
model_name: modelName,
include_probabilities: true
});
return response.data;
} catch (error) {
console.error('Prediction error:', error.response.data);
throw error;
}
}
// Usage
const sample = {
sepal_length: 5.1,
sepal_width: 3.5,
petal_length: 1.4,
petal_width: 0.2
};
predict(sample).then(result => {
console.log('Prediction:', result.prediction);
console.log('Probabilities:', result.probabilities);
});
# Request rate
rate(iris_predictions_total[5m])
# Average prediction latency
rate(iris_prediction_duration_seconds_sum[5m]) / rate(iris_prediction_duration_seconds_count[5m])
# Error rate
rate(iris_errors_total[5m])
# 95th percentile latency
histogram_quantile(0.95, rate(iris_prediction_duration_seconds_bucket[5m]))
# Simple uptime check
curl -f http://localhost:8000/health || echo "API is down"
# Detailed health check
curl -s http://localhost:8000/health | jq '.status'
Issue: “Model not loaded” error
Solution: Load the model first using /models/{model_name}/load
Issue: Slow predictions Solution: Use batch predictions, check model complexity, ensure adequate resources
Issue: Connection timeout Solution: Increase timeout, check network, verify API is running
Issue: High error rate Solution: Check logs, verify input validation, review resource usage