Skip to main content

Overview

This guide covers the complete deployment procedure for a GPUStack v2.1.2 server node and one or more worker nodes, with Cloudflare Tunnel used to expose the inference API securely without opening inbound firewall ports. Intended for self-hosted LLM inference infrastructure on private or lab GPU machines.

Tech Stack

  • GPUStack v2.1.2: GPU cluster management and LLM inference scheduling
  • Cloudflare Tunnel: Zero-inbound-port secure egress tunnel
  • Shell (Bash): Deployment automation scripts

Key Features

  • Full server node initialization and configuration walkthrough
  • Worker node cluster-join procedure
  • Cloudflare Tunnel setup for safe inference endpoint exposure
  • No inbound ports required beyond what the tunnel manages
  • Suitable for horizontal scaling across multiple GPU hosts

Quick Start

  1. Deploy the GPUStack Server Node
    curl -sfL https://get.gpustack.ai | sh -s - --server-host 0.0.0.0
    
  2. Join a Worker Node
    curl -sfL https://get.gpustack.ai | sh -s - \
      --server-url http://<SERVER_IP>:80 \
      --token <JOIN_TOKEN>
    
  3. Create the Cloudflare Tunnel
    cloudflared tunnel create gpustack
    cloudflared tunnel route dns gpustack <your-domain>
    cloudflared tunnel run gpustack