Jun 28

amd-strix-halo-vllm-toolboxes/rdma_cluster/setup_guide.md at main · kyuz0/amd-strix-halo-vllm-toolboxes

General News

▤ Summary

This guide explains how to build a two-node AMD Strix Halo cluster for distributed vLLM inference. It walks through Fedora host setup, RDMA configuration, firewall and SSH preparation, and the toolbox container needed to expose GPU and network devices. It also shows how to start Ray and vLLM, tune tensor parallelism, and handle common issues like gated model downloads and CUDA graph instability. The document includes an alternate Thunderbolt-based setup for users without 100GbE RDMA cards. The main focus is on enabling high-performance local AI inference with optimized networking and ROCm support.