DSL Syntax

Variable Declaration

Variables allow you to specify configurable values when deploying a service. If no value is provided during deployment, the variable will use its defined default value.

variable "memory_mib" {
  type        = number
  description = "Memory allocation in MiB for the model task"
  default     = 14360
}

Keyword	Purpose
`variable`	Declares a reusable value block
`type`	Specifies the type (`number`, `string`, `list`, etc.)
`description`	Describes what this variable is used for
`default`	Provides a fallback value if none is given at deploy time

Reference variables with ${var.variable_name} in your task or resource definitions.

Application Block

The application block is the top-level definition for an application deployment. It describes the overall app and contains one or more service blocks.

application {
  name        = "My Custom Model"
  summary     = "https://huggingface.co/your-company/model"
  description = "A custom model deployment"
  category    = "model"
  tags        = ["llm", "text2text"]

  service "model" {
    name = "my-model"
    // ...service definition...
  }

  service "ui" {
    name = "frontend"
    // ...service definition...
  }
}

Keyword	Purpose
`name`	Human-readable name for your application
`summary`	Link to model card or documentation (optional)
`description`	A short description of your deployment
`category`	Category for grouping (e.g., `model`, `agent`, `tool`)
`tags`	List of tags for search and filtering
`service`	Defines a component of your application (model, UI, etc.)

Each service block describes a deployable component (such as a model backend, UI, or internal service) and can contain nested task_group and task blocks for detailed configuration.

Service Block

A service block defines a deployable component of your application, such as a model backend, UI, or internal service. Each service can contain one or more task_group blocks, which in turn contain task blocks for detailed configuration.

service "model" {
  name = "my-model"

  task_group "model" {
    name  = "my-model"
    count = 1

    task {
      name   = "my-model"
      driver = "docker"
      config {
        image = "my-model-image:latest"
      }
      endpoint api {
        port = 5000
      }
      resources {
        cpu_mhz    = 10240
        memory_mib = 4096
      }
    }
  }
}

# or if you want to re-use an existing service that's deployed
service "model" {
  name = "my-model"
  uses = "${var.model_slug}"
}

Keyword	Purpose
`service`	Declares a service component (e.g., model, ui, internal)
`name`	Name of the service instance
`uses`	(Optional) Reference to another service by it’s distinct slug
`task_group`	Defines a group of tasks for this service
`config`	Service-level configuration (e.g., dependencies)

Each task_group block can contain one or more task blocks, which define how containers or processes are run. Services can also include dependencies and environment variables as needed.

Task Group Block

A task_group block represents a group of tasks (such as containers or commands) that are scheduled together on a single node. All tasks within a task_group share the same lifecycle and can communicate via localhost.

task_group "model" {
  name  = "my-model"
  count = 1

  task {
    name   = "my-model"
    driver = "docker"
    config {
      image = "my-model-image:latest"
    }
    resources {
      cpu_mhz    = 10240
      memory_mib = 4096
    }
  }
}

Keyword	Purpose
`task_group`	Declares a group of tasks to be scheduled on one node
`name`	Name of the task group
`count`	Number of instances of this group to run
`task`	Defines a unit of execution within the group

Task Block

A task block defines a single unit of execution, such as a container or shell command. Tasks are the smallest deployable units and can be containers, scripts, or binaries.

# container task
task {
  name   = "my-model"
  driver = "docker"
  config {
    image = "my-model-image:latest"
  }
  env {
    EXAMPLE_ENV = "value"
  }
  endpoint api {
    port = 5000
  }
  resources {
    cpu_mhz    = 10240
    memory_mib = 4096
  }
}

# exec task
task {
  name   = "backup-task"
  driver = "exec"

  config {
    command = "/bin/bash"
    args    = ["backup.sh"]
  }

  resources {
    memory_mib = 128
  }
}

Keyword	Purpose
`task`	Declares a single unit of execution (container, command, etc.)
`name`	Name of the task
`driver`	Execution backend (e.g., `docker`, `exec`)
`config`	Task-specific configuration (image, command, args, etc.)
`env`	Environment variables for the task
`endpoint`	Network endpoint exposed by the task
`resources`	Resource allocation for the task

Model Block

The model block is syntactic sugar for service "model". The compiler automatically converts a model block into an equivalent service "model" block during processing.

The model block provides a simplified way to deploy a model.

model {
  name  = "llama-8b"
  
  config {
    image = "sglang_hf_generic:latest"
    args  = [
      "--model-path", "meta-llama/Llama-3.1-8B-Instruct",
      "--host", "0.0.0.0",
      "--port", "5000"
    ]
  }

  endpoint llm {
    port = 5000
    health {
      type     = "http"
      path     = "/health"
      interval = "10s"
      timeout  = "2s"
    }
  }

  resources {
    cpu_mhz          = 10240
    memory_mib       = 14360
    sharing_strategy = "${var.sharing_strategy}"
    device {
      name        = "${var.gpu_name}"
      memory_mibs = "${var.gpu_memory_mibs}"
    }
  }
}

Keyword	Purpose
`name`	Unique identifier for your model deployment
`count`	Number of model replicas to run
`config`	Container configuration including image and startup arguments
`args`	List of arguments passed to the container
`endpoint`	Network endpoint and health check configuration
`resources`	Hardware resource allocation for the model

Daemon Block

The daemon block is syntactic sugar for service "daemon". The compiler automatically converts a daemon block into an equivalent service "daemon" block during processing.

The daemon block provides a simplified way to define background services such as log shippers, sidecars, or monitoring agents. daemon services are deployed on every node in the cluster, allowing them to run continuously in the background.

daemon {
  name = "vector"

  config {
    image = "timberio/vector:latest-debian"

    volumes = [
      "/var/log:/var/log:ro"
    ]

    args = [
      "--config-dir", "/local/etc/vector"
    ]
  }

  template {
    data = <<EOF
[sources.my_source]
type = "file"
include = ["/var/log/**/*.log"]
fingerprint.strategy = "device_and_inode"

[sources.docker_logs]
type = "docker_logs"

[sinks.my_sink]
type = "console"
inputs = ["my_source", "docker_logs"]
encoding.codec = "json"
EOF

    destination = "/local/etc/vector/vector.toml"
  }

  resources {
    cpu_mhz    = 100
    memory_mib = 1024
  }
}

Keyword	Purpose
`name`	Unique identifier for your daemon
`config`	Container configuration for the daemon
`image`	Docker image to launch
`volumes`	List of volume mounts for the container
`args`	Command-line arguments passed to the container
`template`	File templating block for configuration files
`data`	Contents of the template file
`destination`	Path inside the container where the template will be written
`resources`	Resource allocation for the daemon

The daemon block is ideal for defining persistent background processes that support your main application, such as log collectors or monitoring agents. It’s also useful for running system jobs that need to execute on every node in the cluster.

Config

Configure how your model container should be launched.

config {
  image    = "your-custom-model-image:latest"
  args     = ["--host", "0.0.0.0", "--port", "5000"]
}

Keyword	Purpose
`image`	Docker image to launch
`args`	Command-line arguments passed to your container

Endpoint & Health Check

Specify the model’s serving port and health check mechanism.

endpoint llm {
  port = 5000

  health {
    type     = "http"
    path     = "/health"
    interval = "10s"
    timeout  = "2s"
  }
}

Keyword	Purpose
`endpoint`	Exposes a port for requests and defines health checks
`port`	Port that the model will serve traffic on the container
`static`	(Optional) Port that the model will serve traffic on the host. If not specified, a host port will be dynamically allocated
`health`	Block that determines if the container is “healthy”
`type`	Health check type (`http`)
`path`	HTTP path to hit for health checking
`interval`	How often to check the health
`timeout`	Timeout for each health check attempt

Ensure the health check path is correctly implemented in your model. If the health check fails, the model may not be marked as healthy.

Resources and GPU Allocation

Declare CPU, memory, and GPU resources for your task.

resources {
  cpu_mhz          = 10240
  memory_mib       = "${var.memory_mib}"
  sharing_strategy = "${var.sharing_strategy}"

  device {
    name        = "${var.gpu_name}"
    memory_mibs = "${var.gpu_memory_mibs}"
  }
}

Keyword	Purpose
`cpu_mhz`	How much CPU to allocate (in MHz)
`memory_mib`	RAM allocated to this task
`sharing_strategy`	GPU usage model (`mps` for CUDA Multi-Process Service)
`device`	GPU allocation block
`name`	GPU device name (e.g., `nvidia/gpu`)
`memory_mibs`	Memory to allocate per GPU (MiB). e.g., `[10240]` requests a node with at least 1 GPU having 10240 MiB of memory

Sample Deployment Flow

# 1. Seed the model recipe
sora recipe seed model.hcl

# 2. View the available recipes
sora recipe list

# 3. Deploy the model using its slug
sora recipe deploy <recipe-slug>

Get Started

Essentials

Variable Declaration

Application Block

Service Block

Task Group Block

Task Block

Model Block

Daemon Block

Config

Endpoint & Health Check

Resources and GPU Allocation

Sample Deployment Flow

Get Started

Essentials

​Variable Declaration

​Application Block

​Service Block

​Task Group Block

​Task Block

​Model Block

​Daemon Block

​Config

​Endpoint & Health Check

​Resources and GPU Allocation

​Sample Deployment Flow

Variable Declaration

Application Block

Service Block

Task Group Block

Task Block

Model Block

Daemon Block

Config

Endpoint & Health Check

Resources and GPU Allocation

Sample Deployment Flow