WEKA has announced the commercial release of its Augmented Memory Grid technology on NeuralMesh, a memory extension platform designed to address GPU memory bottlenecks in data center AI inference workloads. Validated on Oracle Cloud Infrastructure (OCI) and other AI cloud platforms, Augmented Memory Grid extends GPU memory capacity from gigabytes to petabytes, with WEKA reporting up to 1000 times more key-value cache capacity and a 20 times reduction in time-to-first-token for neural network inference operations.
According to WEKA, Augmented Memory Grid bridges GPU high-bandwidth memory and flash-based storage to deliver near-memory speeds. The approach uses remote direct memory access (RDMA) and NVIDIA Magnum IO GPUDirect Storage to move key-value cache data between GPU memory and WEKA’s token warehouse. This allows large language and agentic AI models to access more context without recomputing previously cached tokens, supporting more concurrent users and streamlining inference efficiency at scale.
Independent OCI testing cited by WEKA demonstrates the following performance metrics: 1000 times more key-value cache capacity at near memory performance, 20 times faster time-to-first-token when processing 128,000 tokens (compared to recomputation), and 7.5 million read input/output operations per second (IOPs) plus 1.0 million write IOPs in an eight-node cluster. WEKA claims these improvements maximize GPU utilization and enable new business models based on persistent and stateful AI sessions for data center operators, model providers, and enterprises deploying AI in production.
The solution integrates with NVIDIA GPUDirect Storage, NVIDIA Dynamo, and NVIDIA NIXL, and includes an open-source plugin for the NVIDIA Inference Transfer Library (NIXL). OCI’s bare-metal GPU compute with RDMA networking and GPUDirect Storage is highlighted as providing the required performance foundation.
“WEKA’s Augmented Memory Grid directly confronts this challenge,” said Nathan Thomas, vice president, multicloud, Oracle Cloud Infrastructure. “The 20x improvement in time-to-first-token we observed in joint testing on OCI isn’t just a performance metric; it fundamentally reshapes the cost structure of running AI workloads. For our customers, this makes deploying the next generation of AI easier and cheaper.”
Augmented Memory Grid is now available as a feature for NeuralMesh deployments on Oracle Cloud Marketplace, with planned support for additional cloud platforms.
Source: WEKA







