WEKA Augmented Memory Grid boosts GPU memory for data center AI workloads

WEKA has announced the commercial release of its Augmented Memory Grid technology on NeuralMesh, a memory extension platform designed to address GPU memory bottlenecks in data center AI inference workloads. Validated on Oracle Cloud Infrastructure (OCI) and other AI cloud platforms, Augmented Memory Grid extends GPU memory capacity from gigabytes to petabytes, with WEKA reporting up to 1000 times more key-value cache capacity and a 20 times reduction in time-to-first-token for neural network inference operations.

According to WEKA, Augmented Memory Grid bridges GPU high-bandwidth memory and flash-based storage to deliver near-memory speeds. The approach uses remote direct memory access (RDMA) and NVIDIA Magnum IO GPUDirect Storage to move key-value cache data between GPU memory and WEKA’s token warehouse. This allows large language and agentic AI models to access more context without recomputing previously cached tokens, supporting more concurrent users and streamlining inference efficiency at scale.

Independent OCI testing cited by WEKA demonstrates the following performance metrics: 1000 times more key-value cache capacity at near memory performance, 20 times faster time-to-first-token when processing 128,000 tokens (compared to recomputation), and 7.5 million read input/output operations per second (IOPs) plus 1.0 million write IOPs in an eight-node cluster. WEKA claims these improvements maximize GPU utilization and enable new business models based on persistent and stateful AI sessions for data center operators, model providers, and enterprises deploying AI in production.

The solution integrates with NVIDIA GPUDirect Storage, NVIDIA Dynamo, and NVIDIA NIXL, and includes an open-source plugin for the NVIDIA Inference Transfer Library (NIXL). OCI’s bare-metal GPU compute with RDMA networking and GPUDirect Storage is highlighted as providing the required performance foundation.

“WEKA’s Augmented Memory Grid directly confronts this challenge,” said Nathan Thomas, vice president, multicloud, Oracle Cloud Infrastructure. “The 20x improvement in time-to-first-token we observed in joint testing on OCI isn’t just a performance metric; it fundamentally reshapes the cost structure of running AI workloads. For our customers, this makes deploying the next generation of AI easier and cheaper.”

Augmented Memory Grid is now available as a feature for NeuralMesh deployments on Oracle Cloud Marketplace, with planned support for additional cloud platforms.

Source: WEKA

Tags: Oracle, WEKA

Get Data Center Engineering News In Your Inbox:

Share Your Data Center Engineering News

Do you have a new product announcement, webinar, whitepaper, or article topic?

WEKA Augmented Memory Grid boosts GPU memory for data center AI workloads

Get Data Center Engineering News In Your Inbox:

Popular Posts:

Share Your Data Center Engineering News

Get Data Center Engineering News In Your Inbox:

Latest Data Center Engineering News

LandGate launches instant BESS due-diligence reports for faster site selection near substations and data centers

DigitalOcean adds AMD Instinct MI350X GPU Droplets for production AI inference

Speedata APU brings accelerated Apache Spark analytics to Nebul’s European sovereign cloud data centers

FlexGen updates HybridOS energy management system for battery storage supporting data center load growth

Generac to acquire Enercon to expand data center backup power switchgear and generator enclosures

Aterio reports construction restart signals at Microsoft’s Catawba County data center campuses

Molex launches Impress co-packaged copper connectors for 224 Gbps PAM-4 near-ASIC links in data centers

Qunnect and Cisco demo metro-scale quantum entanglement swapping over 17.6 km of commercial fiber via a data center hub