Variable Declaration

Variables define dynamic values that can be reused throughout the deployment.

variable "memory_mib" {
  type        = number
  description = "Memory allocation in MiB for the model task"
  default     = 14360
}
KeywordPurpose
variableDeclares a reusable value block
typeSpecifies the type (number, string, list, etc.)
descriptionDescribes what this variable is used for
defaultProvides a fallback value if none is given at deploy time

Reference variables with ${var.variable_name} in your task or resource definitions.


Application Block

The application block is the top-level definition for a model deployment.

application {
  name        = "My Custom Model"
  summary     = "https://huggingface.co/your-company/model"
  description = "A custom model deployment"
}
KeywordPurpose
nameHuman-readable name for your model
summaryLink to model card or documentation (optional)
descriptionA short description of your deployment

Service and Task Group

Define a service to a logical group together model execution tasks.

service "model" {
  name = "your-model"

  task_group "model" {
    count = "${var.replica_count}"

    task {
      name   = "your-model-name"
      driver = "docker"
      config { ... }
    }
  }
}
KeywordPurpose
serviceGroups all tasks that serve a single model
task_groupDefines how many replicas of a task to run
countSets the number of replicas (use a variable if needed)
taskA single unit of execution (a container or script)
driverThe runtime environment, e.g., docker
configThe configuration of your task, see below

Task Config

Configure how your model container should be launched.

config {
  image    = "your-custom-model-image:latest"
  ipc_mode = "host"
  args     = ["--host", "0.0.0.0", "--port", "5000"]
}
KeywordPurpose
imageDocker image to launch
ipc_modeSet to host if your container requires GPU IPC
argsCommand-line arguments passed to your container

Endpoint & Health Check

Specify the model’s serving port and health check mechanism.

endpoint llm {
  port = 5000

  health {
    type     = "http"
    path     = "/health"
    interval = "10s"
    timeout  = "2s"
  }
}
KeywordPurpose
endpointExposes a port for requests and defines health checks
portPort that the model will serve traffic on
healthBlock that determines if the container is “healthy”
typeHealth check type (http)
pathHTTP path to hit for health checking
intervalHow often to check the health
timeoutTimeout for each health check attempt

Ensure the health check path is correctly implemented in your model. If the health check fails, the model may not be marked as healthy.


Resources and GPU Allocation

Declare CPU, memory, and GPU resources for your task.

resources {
  cpu_mhz          = 10240
  memory_mib       = "${var.memory_mib}"
  sharing_strategy = "${var.sharing_strategy}"

  device {
    name        = "${var.gpu_name}"
    memory_mibs = "${var.gpu_memory_mibs}"
  }
}
KeywordPurpose
cpu_mhzHow much CPU to allocate (in MHz)
memory_mibRAM allocated to this task
sharing_strategyGPU usage model (mps for CUDA Multi-Process Service)
deviceGPU allocation block
nameGPU device name (e.g., nvidia/gpu)
memory_mibsMemory to allocate per GPU (MiB)

Sample Deployment Flow

# 1. Seed the model recipe
sora recipe seed model.hcl

# 2. View the available recipes
sora recipe list

# 3. Deploy the model using its slug
sora recipe deploy <recipe-slug>