LLM API Reference

Serverless LLM Introduction

We provide compatibility with the OpenAI API standard, allowing for easier integration into existing applications.
Base URL
https://xxxxxxxx.serverless.databasemart.ai
The APIs we support are:

Chat Completion, both streaming and regular.

The Models we support are:

You can find all the models we support here: https://www.databasemart.ai/llm-api to get all available models.

Example with Python Client
pip install 'openai>=1.0.0'

Chat Completions API

from openai import OpenAI

client = OpenAI(
    base_url="https://api.llm.databasemart.ai",
    api_key="",
)

model = "meta-llama/llama-3.1-8b-instruct"
stream = True  # or False
max_tokens = 512

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": "Act like you are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
Example with Curl Client

Chat Completions API

If you’re already using OpenAI’s chat completion endpoint, you can simply set the base URL to https://api.llm.databasemart.ai/v1/chat/completions, obtain and set your API Key, and update the model name according to your needs. With these steps, you’re good to go.

# Set the databasemart AI API Key
export API_KEY="{YOUR databasemart AI API Key}"

curl "https://api.llm.databasemart.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct",
    "messages": [
        {
            "role": "system",
            "content": "Act like you are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hi there!"
        }
    ],
    "max_tokens": 512
}'
Error codes

If the response status code is not 200, we will return the error code and message in JSON format in the response body. The format is as follows:

{
    "code": integer,
    "reason": "string",
    "message": "string"
}

description description description description description description description description

CodeReasonDescription
401INVALID_API_KEYThe API key is invalid. You can check your API key here: Manage API Key
403NOT_ENOUGH_BALANCEYour credit is not enough. You can top up more credit here: Top Up Credit
404MODEL_NOT_FOUNDThe requested model is not found. You can find all the models we support here: https://databasemart.ai/llm-api or request the List models API to get all available models.
429RATE_LIMIT_EXCEEDEDYou have exceeded the rate limit. Please refer to Rate limits for more information.
500MODEL_NOT_AVAILABLEThe requested model is not available now. This is usually due to the model being under maintenance. You can contact us online for more information.

Supported API - Chat Completions API

Creates a model response for the given chat conversation.
POST /v1/chat/completions

The Chat Completions API endpoint will generate a model response from a list of messages comprising a conversation. ISPConfig

Our Chat API is compatible with OpenAI’s Chat Completions API; you can use the official OpenAI Python client to interact with it.

Request Headers

Content-Type string required

Enum: application/json

Authorization string required

Bearer authentication format, for example: Bearer .

Request Body

model string required
The name of the model to use.

messages object[] required
A list of messages comprising the conversation so far.

content string | null required
The contents of the message. content is required for all messages, and may be null for assistant messages with function calls.

role string required
The role of the messages author. One of system, user, or assistant. Enum: system, user, assistant

name string
The name of the author of this message. May contain a-z, A-Z, 0-9, and underscores, with a maximum length of 64 characters.

max_tokens integer required
The maximum number of tokens to generate in the completion. If the token count of your prompt (previous messages) plus max_tokens exceed the model’s context length, the behavior is depends on context_length_exceeded_behavior. By default, max_tokens will be lowered to fit in the context window instead of returning an error.

stream boolean | null default:false
Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events (SSE) as they become available, with the stream terminated by a data: [DONE] message.

n integer | null default:1
How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. Required range: 1 < x < 128

seed integer | null
If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

frequency_penalty number | null default:0
Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. Reasonable value is around 0.1 to 1 if the aim is to just reduce repetitive samples somewhat. If the aim is to strongly suppress repetition, then one can increase the coefficients up to 2, but this can noticeably degrade the quality of samples. Negative values can be used to increase the likelihood of repetition. See also presence_penalty for penalizing tokens that have at least one appearance at a fixed rate. Required range: -2 < x < 2

presence_penalty number | null default:0
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
Reasonable value is around 0.1 to 1 if the aim is to just reduce repetitive samples somewhat. If the aim is to strongly suppress repetition, then one can increase the coefficients up to 2, but this can noticeably degrade the quality of samples. Negative values can be used to increase the likelihood of repetition.
See also frequency_penalty for penalizing tokens at an increasing rate depending on how often they appear.
Required range: -2 < x < 2

repetition_penalty number | null
Applies a penalty to repeated tokens to discourage or encourage repetition. A value of 1.0 means no penalty, allowing free repetition. Values above 1.0 penalize repetition, reducing the likelihood of repeating tokens. Values between 0.0 and 1.0 reward repetition, increasing the chance of repeated tokens. For a good balance, a value of 1.2 is often recommended. Note that the penalty is applied to both the generated output and the prompt in decoder-only models.
Required range: 0 < x < 2

stop string | null
Up to 4 sequences where the API will stop generating further tokens. The returned text will contain the stop sequence.

temperature number | null default:1
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
Required range: 0 < x < 2

top_p number | null
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
Required range: 0 < x <= 1

top_k integer | null
Top-k sampling is another sampling method where the k most probable next tokens are filtered and the probability mass is redistributed among only those k next tokens. The value of k controls the number of candidates for the next token at each step during text generation.
Required range: 1 < x < 128

min_p number | null
float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token.
Required range: 0 <= x <= 1

logit_bias map[integer, integer]
Defaults to null. Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens to an associated bias value from -100 to 100.

logprobs boolean | null default:false
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

top_logprobs integer | null
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
Required range: 0 <= x <= 20

response_format object | null
Allows to force the model to produce specific output format. Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensures the model will match your supplied JSON schema. Setting to { "type": "json_object" } enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.

Response

choices object[] required
The list of chat completion choices.

created integer required
The Unix time in seconds when the response was generated.

id string required
A unique identifier of the response.

model string required
The model used for the chat completion.

object string required
The object type, which is always chat.completion.

usage object
Usage statistics. For streaming responses, usage field is included in the very last response chunk returned.

completion_tokens integer required
The number of tokens in the generated completion.

prompt_tokens integer required
The number of tokens in the prompt.

total_tokens integer required
The total number of tokens used in the request (prompt + completion).


Example request

curl https://api.llm.databasemart.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "messages": [
      {
        "role": "developer",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Response Sample

{
    "id": "chatcmpl-e985eb6fd4c4dbef136ef9f110c7df66",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "Hello! How can I assist you today?",
                "refusal": null,
                "role": "assistant",
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "stop_reason": null
        }
    ],
    "created": 1746411063,
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "object": "chat.completion",
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 10,
        "prompt_tokens": 24,
        "total_tokens": 34,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null
}