Hardware Stack
FarmGPU’s infrastructure is designed as a cohesive system, optimized for AI workloads from silicon through storage and networking. We prioritize performance, efficiency, and forward compatibility, enabling us to deploy the latest hardware while maintaining operational stability.GPUs: High-Performance AI Compute
FarmGPU supports a broad range of modern NVIDIA GPUs to address both frontier and applied AI workloads.Supported GPU Platforms
- NVIDIA H100
- NVIDIA H200
- NVIDIA B200 (Blackwell)
- NVIDIA RTX Pro 6000 Blackwell
- Large-scale model training
- High-throughput inference
- Fine-tuning and experimentation
- Enterprise and workstation-class AI workloads
CPUs: Balanced Host Compute
FarmGPU pairs GPUs with modern, high-performance CPU platforms to ensure that host-side compute never becomes a bottleneck.Supported CPU Platforms
- AMD Turin
- Intel Granite Rapids
- High core counts and memory bandwidth
- Strong PCIe 5.0 connectivity
- Balanced performance for data preparation, orchestration, and I/O-heavy tasks
Storage Fabric: AI-Optimized by Design
FarmGPU treats storage as a core system component, not a peripheral service. Our storage architecture is designed to deliver predictable, high-throughput data paths that keep GPUs fed under real AI workloads.Storage Software Platforms
We deploy and operate leading, production-grade storage platforms, including:- MinIO
- VAST Data
- Weka
- Ceph
Storage Hardware
- Solidigm SSDs across all platforms
- PCIe 5.0 NVMe SSDs in all GPU compute nodes
- High-density 122TB QLC SSDs in storage nodes for cost-efficient scaling
DPU-Accelerated Data Path (NVIDIA BlueField-3)
A key differentiator in FarmGPU’s storage architecture is the use of NVIDIA BlueField-3 DPUs. BlueField-3 DPUs offload critical infrastructure functions from the host CPU, including:- Storage protocol processing
- Network virtualization
- Encryption and security
- Data movement orchestration
- GPU Direct Storage (GDS) with near bare-metal throughput
- Reduced CPU contention on GPU nodes
- More predictable latency under load
- Higher effective GPU utilization (MFU)
Networking: High-Bandwidth, Low-Latency Fabrics
FarmGPU deploys a dual-fabric networking architecture to separate customer-facing traffic from backend AI communication.Front-End Network
- NVIDIA SN5600
- North–south traffic
- Customer access
- Control plane and service connectivity
Back-End AI Fabric
- Celestica DS5000
- Broadcom Tomahawk 5 ASIC
- East–west GPU communication
- Storage traffic
- Collective operations (e.g., NCCL)
Fabric Management
- Hedgehog Open Network Fabric
- Open, vendor-neutral fabric management
- Automated provisioning and monitoring
- High observability and control
RDMA Support
- RDMA over Converged Ethernet (RoCE)
- Tuned with Rocky v2
- Predictable latency
- High throughput at scale
- Efficient GPU-to-GPU and GPU-to-storage communication
Cooling: Today and Tomorrow
Current Deployment
- Air-cooled GPU servers deployed in existing and legacy data centers
- Optimized airflow and rack density based on GPU thermal profiles
Future Roadmap
- Partnerships with two major immersion cooling providers
- Initial immersion deployments planned for 2026
- Designed to support:
- Higher rack densities
- Improved power efficiency
- Next-generation GPU thermal envelopes