Designing the next community cluster
In response to the recommendation of the University of Delaware Research Computing Task Force, UD built two high-performance computing (HPC) community clusters for University researchers. The community cluster program allows researchers to have access to HPC resources without the ongoing financial liability of running and maintaining their own clusters. UD Information Technologies (IT) provides the infrastructure, absorbs 50% of the cost of each cluster, and consolidates the purchasing of individual “compute nodes” to reduce the investors’ cost.
The Mills HPC community cluster, deployed to faculty investors in 2012, reached the end of its nominal lifespan earlier this year. With the shutdown of Mills on the horizon and investment interest voiced by new and existing faculty, IT staff have been working with prominent HPC vendors to produce a design specification for the next University of Delaware community cluster.
Over the past year, IT has conducted interviews with faculty about the community cluster program and the direction of our next HPC cluster. The next cluster’s architecture must meet the goals gleaned from these conversations.
Designs submitted for the new cluster are taking advantage of recent hardware advancements, including a decrease in the physical size of each compute node and a decrease in power requirements. Using these smaller compute nodes should decrease the cost of the cluster by packing the smaller nodes into a chassis with consolidated power delivery. A chassis that could be reused in the future, with newer nodes purchased to replace the old, would dramatically increase the operating lifespan of the cluster. Interest in coprocessor technologies like general-purpose GPUs is on the rise at the University, so the designs were required to include nodes with nVidia Pascal GPU coprocessors.
For the infrastructure of the cluster to achieve a life span up to twice that of the current Mills or Farber clusters, the resources critical to performance of the machine must be designed appropriately. Intel's OmniPath high-speed network fabric was required in all designs: at 100 gigabits per second (Gbps), OmniPath is nearly twice as fast as the InfiniBand network in Farber. OmniPath’s technology roadmap includes enhancements in the next few years that should further increase its efficiency.
Storage hardware in each rack added to the cluster was also mandated: As the cluster grows, capacity is added to both the high-speed Lustre storage system and the long-term NFS storage system. This addition also increases the bandwidth (reads and writes per second) of the storage systems. As in previous clusters, the long-term NFS storage will be replicated off-site for resiliency.
Having the ability to grow the machine should increase the lifespan of our next cluster to at least eight years. Adding nodes and storage capacity as faculty invest in a community cluster keeps the investors’ workloads moving at the expected pace.
On both Mills and Farber those workloads were managed using job scheduling software created by Sun (now Oracle), called Grid Engine. Another product has moved to the forefront of job scheduling for HPC workloads and is being used on many large-scale supercomputers, including the latest XSEDE systems. On the next community cluster, UD will follow that lead and switch to the Simple Linux Utility for Resource Management (SLURM) and a share-based scheduling algorithm that will allow researchers to make more efficient use of the cluster’s resources.
Faculty have asked that more flexible investment options be offered. Competing for and winning grants to cover a large up-front purchase in a cluster is far more difficult than budgeting smaller amounts annually. An option is being explored for the next cluster whereby, after a minimal buy-in (e.g. the cost of one node), faculty could increase their investment in smaller amounts. Each investment would be for a fixed term, and at any time the investor's scheduling priority would be proportional to the total investment. Expired investments would default not to zero but to a minimum value for a period of time, so that the investor has a grace period to continue to access the cluster at the lowest possible priority relative to other users.
The satisfaction and research success of investors and their coworkers is of paramount importance to UD IT. Five months of discussions with HPC vendors have recently concluded, and IT staff are now engaged in presenting the proposed design to faculty and departmental staff. Barring any major issues, fall 2017 should see the third UD community cluster built and deployed.