Variable Declaration

Variables allow you to specify configurable values when deploying a service. If no value is provided during deployment, the variable will use its defined default value.
variable "memory_mib" {
  type        = number
  description = "Memory allocation in MiB for the model task"
  default     = 14360
}
KeywordPurpose
variableDeclares a reusable value block
typeSpecifies the type (number, string, list, etc.)
descriptionDescribes what this variable is used for
defaultProvides a fallback value if none is given at deploy time
Reference variables with ${var.variable_name} in your task or resource definitions.

Application Block

The application block is the top-level definition for an application deployment. It describes the overall app and contains one or more service blocks.
application {
  name        = "My Custom Model"
  summary     = "https://huggingface.co/your-company/model"
  description = "A custom model deployment"
  category    = "model"
  tags        = ["llm", "text2text"]

  service "model" {
    name = "my-model"
    // ...service definition...
  }

  service "ui" {
    name = "frontend"
    // ...service definition...
  }
}
KeywordPurpose
nameHuman-readable name for your application
summaryLink to model card or documentation (optional)
descriptionA short description of your deployment
categoryCategory for grouping (e.g., model, agent, tool)
tagsList of tags for search and filtering
serviceDefines a component of your application (model, UI, etc.)
Each service block describes a deployable component (such as a model backend, UI, or internal service) and can contain nested task_group and task blocks for detailed configuration.

Service Block

A service block defines a deployable component of your application, such as a model backend, UI, or internal service. Each service can contain one or more task_group blocks, which in turn contain task blocks for detailed configuration.
service "model" {
  name = "my-model"

  task_group "model" {
    name  = "my-model"
    count = 1

    task {
      name   = "my-model"
      driver = "docker"
      config {
        image = "my-model-image:latest"
      }
      endpoint api {
        port = 5000
      }
      resources {
        cpu_mhz    = 10240
        memory_mib = 4096
      }
    }
  }
}

# or if you want to re-use an existing service that's deployed
service "model" {
  name = "my-model"
  uses = "${var.model_slug}"
}
KeywordPurpose
serviceDeclares a service component (e.g., model, ui, internal)
nameName of the service instance
uses(Optional) Reference to another service by it’s distinct slug
task_groupDefines a group of tasks for this service
configService-level configuration (e.g., dependencies)
Each task_group block can contain one or more task blocks, which define how containers or processes are run. Services can also include dependencies and environment variables as needed.

Task Group Block

A task_group block represents a group of tasks (such as containers or commands) that are scheduled together on a single node. All tasks within a task_group share the same lifecycle and can communicate via localhost.
task_group "model" {
  name  = "my-model"
  count = 1

  task {
    name   = "my-model"
    driver = "docker"
    config {
      image = "my-model-image:latest"
    }
    resources {
      cpu_mhz    = 10240
      memory_mib = 4096
    }
  }
}
KeywordPurpose
task_groupDeclares a group of tasks to be scheduled on one node
nameName of the task group
countNumber of instances of this group to run
taskDefines a unit of execution within the group

Service Types

Use the service type that matches how the component should be operated and discovered.
  • system: Default job type for general-purpose services. Deploys the cluster based on the resources requested.
  • batch: One-off or scheduled jobs that run to completion (ETL, migrations). Prefer driver = "exec" for short-lived containers.
  • daemon: Long-running background agents that should be present on ALL nodes (log shippers, monitors). Not user-facing.
  • model: Model backends (LLMs, embeddings, VLMs). Requires a task_group "model" with the primary model task. A proxy/API will be auto-generated.
  • public-api: Publicly exposed backend API (stable, internet-facing). These backends will have to implement their own auth layer, and the token-based auth will not be enforced.
  • ui: End-user facing web frontends. Typically receive a public domain and depend on backend services.
  • internal: Private backend components not exposed publicly (DBs, indexers, workers). Consumed by other services.
  • tool: Auxiliary tool services used by agents or backends (e.g., MCP tools). Usually internal-only.
Tips
  • Model services require a task_group labeled model; other labels are rejected.
  • Public APIs and UIs are the typical places where you rely on ${service.<name>.domain_name}.
  • Batch services commonly use task_group "internal" or task_group "tool" with tasks that exit on success.

Task Group Types

Task group type communicates the intent and runtime characteristics of the tasks inside.
  • model: Model-serving workloads with one primary model task; a proxy/API may be auto-generated.
  • tool: Tooling sidecars or tool-specific services used by agents or backends.
  • ui: Frontend tasks serving static assets or web UIs for users.
  • internal: Private backends like DBs, workers, or internal HTTP services.
  • daemon: Background/agent processes; long-running and not user-facing.
  • public-api: Publicly reachable HTTP APIs with domain exposure and health checks.
When to choose which
  • Choose model for LLM/embedding services that expose inference endpoints.
  • Choose internal for non-public components (databases, indexers, workers) used by other services.
  • Choose public-api when clients (including UIs) should call this service directly from outside the cluster.
  • Choose ui for end-user web applications (frontends) or dashboards.
  • Choose tool for MCP servers.
  • Choose daemon for node-level or background tasks that run on all nodes and should not terminate.

Task Block

A task block defines a single unit of execution, such as a container or shell command. Tasks are the smallest deployable units and can be containers, scripts, or binaries.
# container task
task {
  name   = "my-model"
  driver = "docker"
  config {
    image = "my-model-image:latest"
  }
  env {
    EXAMPLE_ENV = "value"
  }
  endpoint api {
    port = 5000
  }
  resources {
    cpu_mhz    = 10240
    memory_mib = 4096
  }
}

# exec task
task {
  name   = "backup-task"
  driver = "exec"

  config {
    command = "/bin/bash"
    args    = ["backup.sh"]
  }

  resources {
    memory_mib = 128
  }
}
KeywordPurpose
taskDeclares a single unit of execution (container, command, etc.)
nameName of the task
driverExecution backend (e.g., docker, exec)
configTask-specific configuration (image, command, args, etc.)
envEnvironment variables for the task
endpointNetwork endpoint exposed by the task
resourcesResource allocation for the task

Model Block

The model block is syntactic sugar for service "model". The compiler automatically converts a model block into an equivalent service "model" block during processing.
The model block provides a simplified way to deploy a model.
model {
  name  = "llama-8b"
  
  config {
    image = "sglang_hf_generic:latest"
    args  = [
      "--model-path", "meta-llama/Llama-3.1-8B-Instruct",
      "--host", "0.0.0.0",
      "--port", "5000"
    ]
  }

  endpoint llm {
    port = 5000
    health {
      type     = "http"
      path     = "/health"
      interval = "10s"
      timeout  = "2s"
    }
  }

  resources {
    cpu_mhz          = 10240
    memory_mib       = 14360
    sharing_strategy = "${var.sharing_strategy}"
    device {
      name        = "${var.gpu_name}"
      memory_mibs = "${var.gpu_memory_mibs}"
    }
  }
}
KeywordPurpose
nameUnique identifier for your model deployment
countNumber of model replicas to run
configContainer configuration including image and startup arguments
argsList of arguments passed to the container
endpointNetwork endpoint and health check configuration
resourcesHardware resource allocation for the model

Daemon Block

The daemon block is syntactic sugar for service "daemon". The compiler automatically converts a daemon block into an equivalent service "daemon" block during processing.
The daemon block provides a simplified way to define background services such as log shippers, sidecars, or monitoring agents. daemon services are deployed on every node in the cluster, allowing them to run continuously in the background.
daemon {
  name = "vector"

  config {
    image = "timberio/vector:latest-debian"

    volumes = [
      "/var/log:/var/log:ro"
    ]

    args = [
      "--config-dir", "/local/etc/vector"
    ]
  }

  template {
    data = <<EOF
[sources.my_source]
type = "file"
include = ["/var/log/**/*.log"]
fingerprint.strategy = "device_and_inode"

[sources.docker_logs]
type = "docker_logs"

[sinks.my_sink]
type = "console"
inputs = ["my_source", "docker_logs"]
encoding.codec = "json"
EOF

    destination = "/local/etc/vector/vector.toml"
  }

  resources {
    cpu_mhz    = 100
    memory_mib = 1024
  }
}
KeywordPurpose
nameUnique identifier for your daemon
configContainer configuration for the daemon
imageDocker image to launch
volumesList of volume mounts for the container
argsCommand-line arguments passed to the container
templateFile templating block for configuration files
dataContents of the template file
destinationPath inside the container where the template will be written
resourcesResource allocation for the daemon
The daemon block is ideal for defining persistent background processes that support your main application, such as log collectors or monitoring agents. It’s also useful for running system jobs that need to execute on every node in the cluster.

Config

Configure how your model container should be launched.
config {
  image    = "your-custom-model-image:latest"
  args     = ["--host", "0.0.0.0", "--port", "5000"]
}
KeywordPurpose
imageDocker image to launch
argsCommand-line arguments passed to your container

Endpoint & Health Check

Specify the model’s serving port and health check mechanism.
endpoint llm {
  port = 5000

  health {
    type     = "http"
    path     = "/health"
    interval = "10s"
    timeout  = "2s"
  }
}
KeywordPurpose
endpointExposes a port for requests and defines health checks
portPort that the model will serve traffic on the container
static(Optional) Port that the model will serve traffic on the host. If not specified, a host port will be dynamically allocated
healthBlock that determines if the container is “healthy”
typeHealth check type (http)
pathHTTP path to hit for health checking
intervalHow often to check the health
timeoutTimeout for each health check attempt
Ensure the health check path is correctly implemented in your model. If the health check fails, the model may not be marked as healthy.

Resources and GPU Allocation

Declare CPU, memory, and GPU resources for your task.
resources {
  cpu_mhz          = 10240
  memory_mib       = "${var.memory_mib}"
  sharing_strategy = "${var.sharing_strategy}"

  device {
    name        = "${var.gpu_name}"
    memory_mibs = "${var.gpu_memory_mibs}"
  }
}
KeywordPurpose
cpu_mhzHow much CPU to allocate (in MHz)
memory_mibRAM allocated to this task
sharing_strategyGPU usage model (mps for CUDA Multi-Process Service)
deviceGPU allocation block
nameGPU device name (e.g., nvidia/gpu)
memory_mibsMemory to allocate per GPU (MiB). e.g., [10240] requests a node with at least 1 GPU having 10240 MiB of memory

Sample Deployment Flow

# 1. Seed the model recipe
sora recipe seed model.hcl

# 2. View the available recipes
sora recipe list

# 3. Deploy the model using its slug
sora recipe deploy <recipe-slug>