amd-strix-halo-vllm-toolboxes/rdma_cluster/setup_guide.md at main · kyuz0/amd-strix-halo-vllm-toolboxes

General News

Summary

This guide explains how to build a two-node AMD Strix Halo cluster for distributed vLLM inference. It walks through Fedora host setup, RDMA configuration, firewall and SSH preparation, and the toolbox container needed to expose GPU and network devices. It also shows how to start Ray and vLLM, tune tensor parallelism, and handle common issues like gated model downloads and CUDA graph instability. The document includes an alternate Thunderbolt-based setup for users without 100GbE RDMA cards. The main focus is on enabling high-performance local AI inference with optimized networking and ROCm support.

Classifications

industries
Fintech & Banking
applications
Accounting and Taxes

AI Classifications

Labels
Semiconductor Software Development AI and HPC Solutions

Linked Companies

AMD
$25M to $50M
Semmle
$1M to $5M
DigitalOcean
$500M to $1B