DDN Preps for AI Wave with Speedy New Appliance, KV Cache Solution

To address the AI-fueled demands on storage that are anticipated to occur with the availability of Nvidia’s Vera Rubin platform, DDN has unveiled the AI400X3M, a new storage appliance that features significantly faster throughput. The company also launched a new KV cache solution that supports Nvidia middleware to serve AI inference workloads, as well as new security and observability capabilities for multi-tenant AI and HPC environments.

The throughput improvements on the AI400X3M, the latest release of its EXAscaler platform are pretty substantial. Write throughput has increased by 35%, from around 140 Gbps max to about 190 Gbps. But the random read throughput has gone up by 4x to 8 million IOPS per second, says DDN Senior Vice President for Products James Coomer.

“We increased the throughput by optimizing the data path,” Coomer said. “We reduced the amount of unnecessary data copies.”

In addition to optimizing the data path through software changes, the company increased storage density to the point where it can support 30PB of data in a single rack. It’s also supporting hybrid arrays that mix NVMe and traditional spinning disk to account for the supply chain issue with NAND. All these changes were made in anticipation of the AI wave crashing across customers, Coomer said.

“It’s a big jump, and it’s really to tackle these new challenges that are happening right now,” Coomer said. “Because we get to see a little bit further into the future through the relationship we have with some of our customers and partners. We can see this stuff’s getting really tough.”

One of the big current pain points is the KV cache that customers use for AI inference. The KV cache–initially stored in HBM and DDR memory close to the processor but inevitably spilling over to disk when local and off-die memory fills up–is necessary to store the AI artifacts that are generated in the initial prefill stage of AI inference. During the critical decode phase, the AI model relies on the KV store to quickly fetch previously computed values, thereby eliminating the need to compute them from scratch and speeding responses to the user or the AI agent making the request. The problem is that memory fills up quickly with the KV cache, necessitating spillover to disk.

DDN is delivering support in both EXAScaler and Infinia products for Nvidia’s KV cache software, including Dynamo and Nixl. The company says that its shared, distributed KV cache fabric is optimized for large-scale inference environments, and delivers “ultra-low latency data access for large-context inference and faster token generation.” By integrating with Dynamo, vLLM, and other frameworks, DDN says it can deliver up to 55x faster KV cache loading, minimizing idle GPUs and driving down token costs

The company is working on another KV cache solution that’s based on Nvidia’s new DMX reference architecture, which leverages BlueField-4 DPUs and SpectrumX SuperNICs. That offering is slated to be delivered later this year, when Nvidia begins shipping the new gear.

Applying enough context to AI requests is the big challenge at the moment. That is what’s driving the industry to rethink how it handles these distributed KV caches. But technologists are working constantly on all sorts of other clever ways to deliver more context, which is keeping Coomer and the folks at DDN on their toes.

“The context around it is always changing,” Coomer said. “The best way to put it is, the memory of the models is going to expand, in one way or another, to be huge. That doesn’t go away. It just changes.”

For instance, Anthropic has introduced another way to minimize the context through a concept dubbed dreams. Just as humans dream to order and contextualize inputs through the day, Anthropic’s dreams concepts helps to condense and crystalize the most important information when the AI model is not in use. It introduced the Claude Dreams API in May. There is also a new DeepSeek compression mechanism that can shrink the size of the KV cache by 10x to 100x. The Chinese company launched that offering in April.

It all helps, Coomer said. “But of course, what happens is the demand outstrips the optimizations, which always happens,” he said. “It’s pretty good, but it doesn’t matter. It still expands. It just fills the room…. I don’t know if there’s a consensus, but I think maybe there’s close to a consensus: the biggest limiting factor is attention, as in the volume of context, which a model can pay attention to.”

DDN also used ISC 2026 to unveil new multi-tenancy capabilities across its EXAScaler and Infinia storage solutions, which are chiefly aimed at HPC Lustre and AI object storage customers, respectively (with plenty of overlap and crossover among the customers and their particular storage needs).=

To that end, DDN has enhanced its offerings with support for bare-metal multi-tenancy, KMIP-based encryption and key management, VictoriaLogs integration for operational visibility, new multi-tenant APIs, intelligent file pinning capabilities, and support for NAND-accelerated “hot pools” to tier data from flash drives to lower-cost HDDs.

“Over time, all of our customers are going through the same gradual tightening of security screens, and the introduction of this new tier of neo clouds is leapfrogging in terms of the demands for security and isolation of these multiple tenants,” Coomer said.

The post DDN Preps for AI Wave with Speedy New Appliance, KV Cache Solution appeared first on AIwire.

Related Posts

The top tidbits and takeaways from New Zealand’s first Michelin ceremony

These 3 young S’pore hawkers put cheese in a curry puff. They made S$500K in their 1st year.

[Initiative] Velora

AI Weekly Issue #505: 100 years from now : The Last War Between Countries