amd-strix-halo-vllm-toolboxes/rdma_cluster/setup_guide.md at main · kyuz0/amd-strix-halo-vllm-toolboxes
Summary
This guide explains how to build a two-node AMD Strix Halo cluster for distributed vLLM inference. It walks through Fedora host setup, RDMA configuration, firewall and SSH preparation, and the toolbox container needed to expose GPU and network devices. It also shows how to start Ray and vLLM, tune tensor parallelism, and handle common issues like gated model downloads and CUDA graph instability. The document includes an alternate Thunderbolt-based setup for users without 100GbE RDMA cards. The main focus is on enabling high-performance local AI inference with optimized networking and ROCm support.
Classifications
industries
Fintech & Banking
applications
Accounting and Taxes
AI Classifications
Labels
Semiconductor
Software Development
AI and HPC Solutions