Variable Declaration

Variables allow you to specify configurable values when deploying a service. If no value is provided during deployment, the variable will use its defined default value.

variable "memory_mib" {
  type        = number
  description = "Memory allocation in MiB for the model task"
  default     = 14360
}
KeywordPurpose
variableDeclares a reusable value block
typeSpecifies the type (number, string, list, etc.)
descriptionDescribes what this variable is used for
defaultProvides a fallback value if none is given at deploy time

Reference variables with ${var.variable_name} in your task or resource definitions.


Application Block

The application block is the top-level definition for an application deployment. It describes the overall app and contains one or more service blocks.

application {
  name        = "My Custom Model"
  summary     = "https://huggingface.co/your-company/model"
  description = "A custom model deployment"
  category    = "model"
  tags        = ["llm", "text2text"]

  service "model" {
    name = "my-model"
    // ...service definition...
  }

  service "ui" {
    name = "frontend"
    // ...service definition...
  }
}
KeywordPurpose
nameHuman-readable name for your application
summaryLink to model card or documentation (optional)
descriptionA short description of your deployment
categoryCategory for grouping (e.g., model, agent, tool)
tagsList of tags for search and filtering
serviceDefines a component of your application (model, UI, etc.)

Each service block describes a deployable component (such as a model backend, UI, or internal service) and can contain nested task_group and task blocks for detailed configuration.


Service Block

A service block defines a deployable component of your application, such as a model backend, UI, or internal service. Each service can contain one or more task_group blocks, which in turn contain task blocks for detailed configuration.

service "model" {
  name = "my-model"

  task_group "model" {
    name  = "my-model"
    count = 1

    task {
      name   = "my-model"
      driver = "docker"
      config {
        image = "my-model-image:latest"
      }
      endpoint api {
        port = 5000
      }
      resources {
        cpu_mhz    = 10240
        memory_mib = 4096
      }
    }
  }
}

# or if you want to re-use an existing service that's deployed
service "model" {
  name = "my-model"
  uses = "${var.model_slug}"
}
KeywordPurpose
serviceDeclares a service component (e.g., model, ui, internal)
nameName of the service instance
uses(Optional) Reference to another service by it’s distinct slug
task_groupDefines a group of tasks for this service
configService-level configuration (e.g., dependencies)

Each task_group block can contain one or more task blocks, which define how containers or processes are run. Services can also include dependencies and environment variables as needed.


Task Group Block

A task_group block represents a group of tasks (such as containers or commands) that are scheduled together on a single node. All tasks within a task_group share the same lifecycle and can communicate via localhost.

task_group "model" {
  name  = "my-model"
  count = 1

  task {
    name   = "my-model"
    driver = "docker"
    config {
      image = "my-model-image:latest"
    }
    resources {
      cpu_mhz    = 10240
      memory_mib = 4096
    }
  }
}
KeywordPurpose
task_groupDeclares a group of tasks to be scheduled on one node
nameName of the task group
countNumber of instances of this group to run
taskDefines a unit of execution within the group

Task Block

A task block defines a single unit of execution, such as a container or shell command. Tasks are the smallest deployable units and can be containers, scripts, or binaries.

# container task
task {
  name   = "my-model"
  driver = "docker"
  config {
    image = "my-model-image:latest"
  }
  env {
    EXAMPLE_ENV = "value"
  }
  endpoint api {
    port = 5000
  }
  resources {
    cpu_mhz    = 10240
    memory_mib = 4096
  }
}

# exec task
task {
  name   = "backup-task"
  driver = "exec"

  config {
    command = "/bin/bash"
    args    = ["backup.sh"]
  }

  resources {
    memory_mib = 128
  }
}
KeywordPurpose
taskDeclares a single unit of execution (container, command, etc.)
nameName of the task
driverExecution backend (e.g., docker, exec)
configTask-specific configuration (image, command, args, etc.)
envEnvironment variables for the task
endpointNetwork endpoint exposed by the task
resourcesResource allocation for the task

Model Block

The model block is syntactic sugar for service "model". The compiler automatically converts a model block into an equivalent service "model" block during processing.

The model block provides a simplified way to deploy a model.

model {
  name  = "llama-8b"
  
  config {
    image = "sglang_hf_generic:latest"
    args  = [
      "--model-path", "meta-llama/Llama-3.1-8B-Instruct",
      "--host", "0.0.0.0",
      "--port", "5000"
    ]
  }

  endpoint llm {
    port = 5000
    health {
      type     = "http"
      path     = "/health"
      interval = "10s"
      timeout  = "2s"
    }
  }

  resources {
    cpu_mhz          = 10240
    memory_mib       = 14360
    sharing_strategy = "${var.sharing_strategy}"
    device {
      name        = "${var.gpu_name}"
      memory_mibs = "${var.gpu_memory_mibs}"
    }
  }
}
KeywordPurpose
nameUnique identifier for your model deployment
countNumber of model replicas to run
configContainer configuration including image and startup arguments
argsList of arguments passed to the container
endpointNetwork endpoint and health check configuration
resourcesHardware resource allocation for the model

Daemon Block

The daemon block is syntactic sugar for service "daemon". The compiler automatically converts a daemon block into an equivalent service "daemon" block during processing.

The daemon block provides a simplified way to define background services such as log shippers, sidecars, or monitoring agents. daemon services are deployed on every node in the cluster, allowing them to run continuously in the background.

daemon {
  name = "vector"

  config {
    image = "timberio/vector:latest-debian"

    volumes = [
      "/var/log:/var/log:ro"
    ]

    args = [
      "--config-dir", "/local/etc/vector"
    ]
  }

  template {
    data = <<EOF
[sources.my_source]
type = "file"
include = ["/var/log/**/*.log"]
fingerprint.strategy = "device_and_inode"

[sources.docker_logs]
type = "docker_logs"

[sinks.my_sink]
type = "console"
inputs = ["my_source", "docker_logs"]
encoding.codec = "json"
EOF

    destination = "/local/etc/vector/vector.toml"
  }

  resources {
    cpu_mhz    = 100
    memory_mib = 1024
  }
}
KeywordPurpose
nameUnique identifier for your daemon
configContainer configuration for the daemon
imageDocker image to launch
volumesList of volume mounts for the container
argsCommand-line arguments passed to the container
templateFile templating block for configuration files
dataContents of the template file
destinationPath inside the container where the template will be written
resourcesResource allocation for the daemon

The daemon block is ideal for defining persistent background processes that support your main application, such as log collectors or monitoring agents. It’s also useful for running system jobs that need to execute on every node in the cluster.


Config

Configure how your model container should be launched.

config {
  image    = "your-custom-model-image:latest"
  args     = ["--host", "0.0.0.0", "--port", "5000"]
}
KeywordPurpose
imageDocker image to launch
argsCommand-line arguments passed to your container

Endpoint & Health Check

Specify the model’s serving port and health check mechanism.

endpoint llm {
  port = 5000

  health {
    type     = "http"
    path     = "/health"
    interval = "10s"
    timeout  = "2s"
  }
}
KeywordPurpose
endpointExposes a port for requests and defines health checks
portPort that the model will serve traffic on the container
static(Optional) Port that the model will serve traffic on the host. If not specified, a host port will be dynamically allocated
healthBlock that determines if the container is “healthy”
typeHealth check type (http)
pathHTTP path to hit for health checking
intervalHow often to check the health
timeoutTimeout for each health check attempt

Ensure the health check path is correctly implemented in your model. If the health check fails, the model may not be marked as healthy.


Resources and GPU Allocation

Declare CPU, memory, and GPU resources for your task.

resources {
  cpu_mhz          = 10240
  memory_mib       = "${var.memory_mib}"
  sharing_strategy = "${var.sharing_strategy}"

  device {
    name        = "${var.gpu_name}"
    memory_mibs = "${var.gpu_memory_mibs}"
  }
}
KeywordPurpose
cpu_mhzHow much CPU to allocate (in MHz)
memory_mibRAM allocated to this task
sharing_strategyGPU usage model (mps for CUDA Multi-Process Service)
deviceGPU allocation block
nameGPU device name (e.g., nvidia/gpu)
memory_mibsMemory to allocate per GPU (MiB). e.g., [10240] requests a node with at least 1 GPU having 10240 MiB of memory

Sample Deployment Flow

# 1. Seed the model recipe
sora recipe seed model.hcl

# 2. View the available recipes
sora recipe list

# 3. Deploy the model using its slug
sora recipe deploy <recipe-slug>