Prerequisites
To deploy your own LLMs on SoraNova, begin by setting up the CLI. If you haven’t already, install it using:
curl -fsSL https://releases.s4c.ai/cli/install_cli.sh | sh
Deploying a Huggingface LLM
SoraNova supports deploying any LLM available on Huggingface via a generic Docker image. Below is a sample configuration for deploying Meta’s Llama 3.1 8B Instruct model. You can customize the hardware allocation, GPU memory, and sharing strategy via variables:
The deployment configuration DSL shown below is experimental and subject to change in future releases.
variable "gpu_name" {
type = string
description = "GPU name for the model task"
default = "nvidia/gpu/NVIDIA L4"
}
variable "gpu_memory_mibs" {
type = list(number)
description = "GPU memory allocation in MiB for the model task"
default = [20480, 20480]
}
variable "sharing_strategy" {
type = string
description = "GPU sharing strategy for the model task"
default = "mps"
}
model {
name = "llama-8b"
count = 1
config {
image = "sglang_hf_generic:latest"
args = [
"--model-path", "meta-llama/Llama-3.1-8B-Instruct",
"--host", "0.0.0.0",
"--port", "5000"
]
}
endpoint llm {
port = 5000
health {
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
}
resources {
cpu_mhz = 10240
memory_mib = 14360
sharing_strategy = "${var.sharing_strategy}"
device {
name = "${var.gpu_name}"
memory_mibs = "${var.gpu_memory_mibs}"
}
}
}
gpu_memory_mibs = [20480, 20480]
means a single node with two GPUs with 20,480MiB of memory each.
To deploy the model:
sora recipe seed model.hcl
sora recipe list
sora recipe deploy <recipe-slug> # replace with the slug from the previous command
Interacting with the Model
After deployment, list your models using:
To get an API endpoint for querying the model:
sora model api <your-model-slug>
You’ll receive a curl
command that looks like this:
curl -X POST 'https://llama-8b-43112745.demo.soranova.net/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <your-token>' \
-d '{
"max_tokens": 2048,
"messages": [
{
"content": "<YOUR MESSAGE>",
"role": "user"
}
],
"model": "<MODEL NAME>",
"stream": false,
"temperature": 0.6
}'
🎉 That’s it — your model is now live and ready to serve requests. Happy building!