> ## Documentation Index
> Fetch the complete documentation index at: https://docs.farmgpu.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Hardware Stack

> FarmGPU GPU, CPU, storage fabric, and networking infrastructure

## Hardware Stack

FarmGPU's infrastructure is designed as a **cohesive system**, optimized for AI workloads from silicon through storage and networking. We prioritize **performance, efficiency, and forward compatibility**, enabling us to deploy the latest hardware while maintaining operational stability.

## GPUs: High-Performance AI Compute

FarmGPU supports a broad range of modern NVIDIA GPUs to address both frontier and applied AI workloads.

### Supported GPU Platforms

* **NVIDIA H100**
* **NVIDIA H200**
* **NVIDIA B200 (Blackwell)**
* **NVIDIA RTX Pro 6000 Blackwell**

This range allows us to support:

* Large-scale model training
* High-throughput inference
* Fine-tuning and experimentation
* Enterprise and workstation-class AI workloads

Our GPU nodes are configured for **maximum utilization and predictable performance**, with platform designs aligned to real-world AI workload characteristics rather than generic cloud abstractions.

## CPUs: Balanced Host Compute

FarmGPU pairs GPUs with modern, high-performance CPU platforms to ensure that host-side compute never becomes a bottleneck.

### Supported CPU Platforms

* **AMD Turin**
* **Intel Granite Rapids**

These CPUs provide:

* High core counts and memory bandwidth
* Strong PCIe 5.0 connectivity
* Balanced performance for data preparation, orchestration, and I/O-heavy tasks

By supporting both AMD and Intel platforms, FarmGPU maintains flexibility across supply chains and customer requirements.

## Storage Fabric: AI-Optimized by Design

FarmGPU treats storage as a **core system component**, not a peripheral service. Our storage architecture is designed to deliver **predictable, high-throughput data paths** that keep GPUs fed under real AI workloads.

### Storage Software Platforms

We deploy and operate leading, production-grade storage platforms, including:

* **MinIO**
* **VAST Data**
* **Weka**
* **Ceph**

All storage platforms are **benchmarked and validated** using FarmGPU's internal **Silo** benchmarking and evaluation suite to ensure consistent performance across training and inference workloads.

### Storage Hardware

* **Solidigm SSDs across all platforms**
* **PCIe 5.0 NVMe SSDs** in all GPU compute nodes
* **High-density 122TB QLC SSDs** in storage nodes for cost-efficient scaling

This combination enables both **high-performance hot data paths** and **economical cold and capacity tiers**, optimized for AI dataset growth.

### DPU-Accelerated Data Path (NVIDIA BlueField-3)

A key differentiator in FarmGPU's storage architecture is the use of **NVIDIA BlueField-3 DPUs**.

BlueField-3 DPUs offload critical infrastructure functions from the host CPU, including:

* Storage protocol processing
* Network virtualization
* Encryption and security
* Data movement orchestration

By offloading these tasks, FarmGPU enables:

* **GPU Direct Storage (GDS)** with near bare-metal throughput
* Reduced CPU contention on GPU nodes
* More predictable latency under load
* Higher effective GPU utilization (MFU)

In AI systems, performance is determined by the **end-to-end data path**. DPUs allow FarmGPU to optimize that path holistically, rather than relying on CPU-bound software layers.

***

## Networking: High-Bandwidth, Low-Latency Fabrics

FarmGPU deploys a **dual-fabric networking architecture** to separate customer-facing traffic from backend AI communication.

### Front-End Network

* **NVIDIA SN5600**

Optimized for:

* North–south traffic
* Customer access
* Control plane and service connectivity

### Back-End AI Fabric

* **Celestica DS5000**
* **Broadcom Tomahawk 5 ASIC**

Optimized for:

* East–west GPU communication
* Storage traffic
* Collective operations (e.g., NCCL)

### Fabric Management

* **Hedgehog Open Network Fabric**

Provides:

* Open, vendor-neutral fabric management
* Automated provisioning and monitoring
* High observability and control

### RDMA Support

* **RDMA over Converged Ethernet (RoCE)**
* Tuned with **Rocky v2**

This networking architecture ensures:

* Predictable latency
* High throughput at scale
* Efficient GPU-to-GPU and GPU-to-storage communication

## Cooling: Today and Tomorrow

### Current Deployment

* Air-cooled GPU servers deployed in existing and legacy data centers
* Optimized airflow and rack density based on GPU thermal profiles

### Future Roadmap

* Partnerships with **two major immersion cooling providers**
* Initial immersion deployments planned for **2026**
* Designed to support:
  * Higher rack densities
  * Improved power efficiency
  * Next-generation GPU thermal envelopes

This phased approach allows FarmGPU to scale efficiently today while preparing for the thermal and power demands of future AI hardware.
