WEKA Augmented Memory Grid boosts GPU memory for data center AI workloads

WEKA has announced the commercial release of its Augmented Memory Grid technology on NeuralMesh, a memory extension platform designed to address GPU memory bottlenecks in data center AI inference workloads. Validated on Oracle Cloud Infrastructure (OCI) and other AI cloud platforms, Augmented Memory Grid extends GPU memory capacity from gigabytes to petabytes, with WEKA reporting up to 1000 times more key-value cache capacity and a 20 times reduction in time-to-first-token for neural network inference operations.

According to WEKA, Augmented Memory Grid bridges GPU high-bandwidth memory and flash-based storage to deliver near-memory speeds. The approach uses remote direct memory access (RDMA) and NVIDIA Magnum IO GPUDirect Storage to move key-value cache data between GPU memory and WEKA’s token warehouse. This allows large language and agentic AI models to access more context without recomputing previously cached tokens, supporting more concurrent users and streamlining inference efficiency at scale.

Independent OCI testing cited by WEKA demonstrates the following performance metrics: 1000 times more key-value cache capacity at near memory performance, 20 times faster time-to-first-token when processing 128,000 tokens (compared to recomputation), and 7.5 million read input/output operations per second (IOPs) plus 1.0 million write IOPs in an eight-node cluster. WEKA claims these improvements maximize GPU utilization and enable new business models based on persistent and stateful AI sessions for data center operators, model providers, and enterprises deploying AI in production.

The solution integrates with NVIDIA GPUDirect Storage, NVIDIA Dynamo, and NVIDIA NIXL, and includes an open-source plugin for the NVIDIA Inference Transfer Library (NIXL). OCI’s bare-metal GPU compute with RDMA networking and GPUDirect Storage is highlighted as providing the required performance foundation.

“WEKA’s Augmented Memory Grid directly confronts this challenge,” said Nathan Thomas, vice president, multicloud, Oracle Cloud Infrastructure. “The 20x improvement in time-to-first-token we observed in joint testing on OCI isn’t just a performance metric; it fundamentally reshapes the cost structure of running AI workloads. For our customers, this makes deploying the next generation of AI easier and cheaper.”

Augmented Memory Grid is now available as a feature for NeuralMesh deployments on Oracle Cloud Marketplace, with planned support for additional cloud platforms.

Source: WEKA

Get Data Center Engineering News In Your Inbox:

Popular Posts:

1765906506220
Tritium launches 800 VDC bidirectional inverter for data centers and renewable energy sites
Water_From_Air_Data_Center_Rendering_Nov_2025
iMasons selects Water From Air for data center waste heat-to-water innovation
Screenshot
Five AI data centers to reach 1 GW power capacity in 2026, new analysis shows
River Bend Rendering
Hut 8 announces $7 billion, 245 MW hyperscale data center IT lease with Fluidstack backed by Google
68e79d30a17eea847251fae6_img-home-product-liquidjet-main
Frore Systems updates LiquidJet direct-to-chip coldplate for 1,950 W NVIDIA Rubin data center GPUs

Share Your Data Center Engineering News

Do you have a new product announcement, webinar, whitepaper, or article topic? 

Get Data Center Engineering News In Your Inbox: