Penguin Solutions has announced the upcoming release of ICE ClusterWare software version 13.0, a management platform designed for high-performance computing (HPC) and artificial intelligence (AI) clusters. According to Penguin Solutions, the new version introduces two key features: patent-pending anomaly detection with automated remediation, and network-isolated multi-tenancy for secure resource separation.
The anomaly detection and auto-remediation feature in ICE ClusterWare 13.0 continuously monitors cluster operations to identify hidden performance degradation. If an underperforming node is found, the software isolates it and initiates automated remediation in real time, ensuring only validated high-performing nodes handle workloads. Penguin Solutions claims this reduces manual intervention, minimizes downtime, and accelerates model training by cutting down on restart events.
The network-isolated multi-tenancy feature gives organizations the ability to segment a single cluster into secure, dedicated subclusters. Each tenant—such as a department, project, or external GPU-as-a-Service customer—can operate in an isolated environment, choose its own workload manager, and govern users, with assurance that data and operations are securely segregated. This capability targets those operating large GPU clusters for diverse internal or external user groups, aiming to maximize infrastructure utilization while maintaining security and autonomy for each group.
Penguin Solutions cites applications for ICE ClusterWare 13.0 in hyperscale and cloud service provider data centers, enterprises delivering AI computing to multiple business groups, research institutes, and government agencies requiring stringent resource isolation and security.
ICE ClusterWare 13.0 is scheduled for general availability on December 2, 2025.
For organizations considering biomedical and life sciences research workloads, Assistant Dean for Information Technology Shailesh Shenoy at Albert Einstein College of Medicine stated, “The pace and quality of biomedical research are directly tied to the technology that supports it,” adding, “AI and HPC are crucial to providing the computational power that biometrics, life science, and medical research require, but we also had to ensure that it is optimized for our specific use cases. Having a trusted partner in Penguin Solutions has enabled us to not only build out this infrastructure, but also helped ensure we can manage and optimize it to keep it running smoothly and at capacity, freeing our faculty and student researchers to continue their groundbreaking work without interruption.”
Source: Penguin Solutions







