Huggingface LLMs
Deploy your own LLMs from Huggingface to SoraNova
Prerequisites
To deploy your own LLMs on SoraNova, begin by setting up the CLI. If you haven’t already, install it using:
Deploying a Huggingface LLM
SoraNova supports deploying any LLM available on Huggingface via a generic Docker image. Below is a sample configuration for deploying Meta’s Llama 3.1 8B Instruct model. You can customize the hardware allocation, GPU memory, and sharing strategy via variables:
The deployment configuration DSL shown below is experimental and subject to change in future releases.
gpu_memory_mibs = [20480, 20480]
means a single node with two GPUs with 20,480MiB of memory each.To deploy the model:
Interacting with the Model
After deployment, list your models using:
To get an API endpoint for querying the model:
You’ll receive a curl
command that looks like this:
🎉 That’s it — your model is now live and ready to serve requests. Happy building!