Systems management has always been in a race to catch up with the innovation in systems, and it is always nipping at the heels. As systems have gotten more complex, first by expanding beyond a single chassis into clusters of machines operating in concert and then by adding progressive layers of abstraction (heavy virtualization and more ethereal containers are the two big ones in the past decade to go mainstream), managing that complexity has become a real chore.
If the hyperscalers, cloud builders, and HPC centers of the world have taught us anything, it is that as much effort has to be made on measuring, monitoring, and managing any new technology as was used to create and maintain that technology. Mere humans, working from libraries of handwritten shell scripts and fast finger command lines, cannot manage at the scale and at the speed that modern software stacks require.
There are many lessons to be kept from the HPC community, and many more new ones to learn from the hyperscalers. And few companies know this better than Univa, which took over managing the Grid Engine workload manager for HPC workloads many years ago and which more recently brought its Navops Launch product out to bring Kubernetes container management to HPC centers as well as enterprise customers who have to juggle traditional HPC and modern containerized, cloud-style workloads.\
In an interview at the HPC Day event that we hosted prior to the SC19 supercomputing conference in Denver, we sat down with Gary Tyreman, chief executive officer at Univa, to talk about the issues facing HPC centers that want to some or all of their workloads to the public clouds. We baited Tyreman a little by suggesting that systems management was the problem.