Overview
This guide covers the complete deployment procedure for a GPUStack v2.1.2 server node and one or more worker nodes, with Cloudflare Tunnel used to expose the inference API securely without opening inbound firewall ports. Intended for self-hosted LLM inference infrastructure on private or lab GPU machines.Tech Stack
- GPUStack v2.1.2: GPU cluster management and LLM inference scheduling
- Cloudflare Tunnel: Zero-inbound-port secure egress tunnel
- Shell (Bash): Deployment automation scripts
Key Features
- Full server node initialization and configuration walkthrough
- Worker node cluster-join procedure
- Cloudflare Tunnel setup for safe inference endpoint exposure
- No inbound ports required beyond what the tunnel manages
- Suitable for horizontal scaling across multiple GPU hosts
Quick Start
-
Deploy the GPUStack Server Node
-
Join a Worker Node
-
Create the Cloudflare Tunnel