Serverless LLM, Deploy
AI Models on GPU Cloud

DatabaseMart's Serverless LLM is the go-to inference platform for AI developers seeking a low-cost, reliable, and simple solution for shipping AI models. Deploy LLMs in One Click. No Servers, Just Speed.

Choose your Dedicated Endpoint Plan

Serverless LLM is Databasemart's first pay-as-you-go GPU Cloud product. It is currently in trial operation and more GPU instances will be available soon.

3xV100 48GB VRAM

  • Entry-level Plan, Support 14b, 8b, 7b and below models, such as DeepSeek-R1-Distill-Qwen-14B, DeepSeek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B on Hugging Face.
  • OS: Linux
  • GPU: Nvidia V100
  • Architecture: Volta
  • CUDA Cores: 5,120
  • GPU Memory: 16GB HBM2
  • GPU Count: 3
0.83/Hour

LLM Model Library

Browse our supported open source Large Language models
DeepSeek-R1

DeepSeek-R1

DeepSeek-R1 is DeepSeek’s first-generation reasoning models, achieving performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
Qwen 2.5

Qwen 2.5

Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.
Llama 3.1

Llama 3.1

Llama 3.1 is the state-of-the-art, available in 8B, 70B and 405B parameter sizes. Meta’s smaller models are competitive with closed and open models that have a similar number of parameters.
Gemma 2

Gemma 2

Google’s Gemma 2 model is available in three sizes, 2B, 9B and 27B, featuring a brand new architecture designed for class leading performance and efficiency.
Mistral AI

Mistral AI

Mistral AI specializes in developing LLMs and generative AI technologies, with a focus on open-weight and efficient AI models as alternatives to proprietary systems.

More Comming Soon

If you don't find the model you like, please feel free to contact us and we will give priority to it.

Serverless LLM Quickstart

In 4 steps you can have your own LLM API. This is a serverless quickstart guide to help you setup your account and get started.
step2
Select the Serverless LLM product, create an API Key, and recharge Credits.
step3
Select a GPU Instance and create a Dedicate Endpoint, name your endpoint name and select a model.
step4
The model is automatically deployed, and you can enjoy your Dedicated LLM API in a few minutes.

Advantages of Serverless LLM

Deploying LLMs with Serverless LLM offers many advantages compared to using traditional physical servers or shared API services.

Faster Deployment – Ready in Minutes

With Serverless LLM, you can deploy large language models in just a few minutes — no setup, no delays. Compared to traditional physical servers, this drastically reduces time-to-market and iteration cycles.

Cost-Efficient – Pay by the Hour

Only pay for what you use, down to the hour. This eliminates the overhead of idle GPU resources, making it more budget-friendly than maintaining dedicated infrastructure.

No Server Management – Focus on What Matters

Forget about provisioning, scaling, and maintaining servers. Our platform handles the infrastructure so you can focus entirely on building and deploying your AI models.

OpenAI-Compatible API – Plug and Play

Access your models using an OpenAI-compatible API. This makes integration effortless, allowing you to switch from or extend beyond OpenAI with minimal code changes.

Dedicated Endpoints – Guaranteed Performance

Each deployment gets its own exclusive endpoint, ensuring that performance isn't affected by other users. You get consistent, reliable GPU power without noisy neighbors.

Real-Time Metrics – Full Visibility into Performance

Monitor GPU usage and API response metrics with real-time dashboards. See performance curves instantly, helping you optimize usage and detect issues quickly.

Contact Us

If you can't find a suitable GPU Plan or Model, or have a need to customize a GPU instance, or have ideas for cooperation, please leave us a message.
Email *
Name
Company
Message *
I agree to be contacted as per Database Mart privacy policy.