Recently we’ve seen data center vendors advocate for a new breed of on-premises infrastructures branded as open convergence. Datrium is one of these, and if they are not the herald bearing high the flag of open convergence waving gracefully in the wind of change, they can very likely claim to be the innovators and founders of this emerging infrastructure category – if it becomes one.
Let’s have a look below at how we got to open convergence and what it stands for exactly. We will partially cover Datrium DVX, but we will maintain our focus on differences between converged, hyper-converged and open-converged infrastructures.
First there was convergence
When converged solutions appeared first on the data center market, it was a revolution. Gone was supposed to be the hassle of configuring best-of-breed hardware, going through tedious sizing and performance testing exercises, not to talk about component interoperability. Also, the problem of multi-tier support was supposed to be resolved.
While it’s true that converged infrastructures (CI) delivered on some if not most of these promises, they also had some challenges of their own such as the necessary but cumbersome requirements of patch / firmware level matrices to be strictly adhered to, scalability limits either on the compute or storage size, and also the difficult exercise of « getting it right » in terms of sizing, not to talk about procurement, build times and deployment times. It might not be the perfect balance, but we still find nowadays customers who decide to settle for CI refreshes.
The rise of hyper-convergence
The hyper-converged infrastructures (HCI) market first started with the vision of linear scalability, modularity and fast provisioning times. The idea that growing compute and storage in a linear and predictable way would make infrastructure deployment and scaling a seamless process. It moved from being « a crazy idea » and to some a risky approach to what it is today: a living market of its own.
HCI has proved to be a great fit for many customers (perhaps by lack of other, better options?), either standalone or alongside CI for targeted deployments. Nevertheless, there are use cases where linear scalability aspects can be detrimental from a cost and/or compute/capacity perspective, leading to imbalances. HCI vendors promoting linear scalability have addressed this by offering storage-only nodes, while a new category of HCI vendors made the bet to offer solutions that disaggregate the link between storage and compute. Finally, one aspect that is often raised by detractors of HCI is the vendor lock-in (which also applied to CI, by the way). Indeed, consuming pre-packaged HCI appliances could mean vendor lock-in, however some vendors have partnerships with other OEM vendors to allow flexibility in their HCI offerings.
By the way, lock-in is a very disputable aspect of technologies, it is not just from a vendor perspective but also from a technological perspective. If you decide to adopt an open-source, bleeding-edge technology and convert all your processes to leverage it, did you lock yourself in from a vendor perspective? Certainly not, as you can probably use commodity x86 servers for the compute, and whatever other part for the storage aspects. And yet if you want to adopt another platform you might find yourself entangled into lock-in issues of technical matters. The debate of what is lock-in is not in scope of this article, but I thought I’d give it at least a scratch for thought-provoking reasons.
Almost two years ago I wrote an article about hyper-convergence that turned out to be very popular and already then I was covering specific cases where HCI may not make the best sense.
Beyond hyper-convergence: open convergence
One of the aspects that was often criticised in HCI systems was the imbalance that would sometimes result from the packaging of compute and storage in a single form factor, where a subset of the resources would end up either over-dimensioned or under-dimensioned.
Some late comers into the HCI market had well noted this and proposed solutions to scale storage and compute independently, while maintaining the attributes of HCI solutions, i.e. full nodes (storage+compute) and storage-only nodes participating in the same cluster, with data being stored across all the nodes, therefore following the traditional concept of east-west communication and replication between nodes. Interestingly, I believe that the main actors in the HCI world had also implemented the concept of storage-only nodes as early as of mid-2015 if not earlier. That approach allowed to add extra storage and correct imbalances when compute capacity would far exceed storage needs, but it didn’t really address some requirements where a customer would need to satisfy way more compute require than storage needs.
Others decided to take a different approach and start looking again at the entire HCI architecture. They decided to completely separate compute functions from storage functions: compute nodes would execute VM workloads and keep active data locally on flash and independently of other compute nodes; storage nodes would constitute a distributed filesystem with global deduplication in place; compute nodes would take care of all I/O operations towards storage nodes. One vendor, Datrium, has taken this path – or rather have created this approach, labeled by them as « Open Convergence » (I will shorten this to OCI, for open-converged infrastructures).
I’ve been recently briefed by their CTO (Hugo Patterson) on what they are doing in this space and I was positively impressed. While I won’t cover the technical aspects of their solution (friends have done that very well, see the « Further Reading » section at the end of this article), here are some of the attributes that we can see in OCI:
- Segregation between compute and storage functions
- Linear but decoupled scaling (linearly scale compute & i/o vs. linearly scale storage capacity)
- Compute node statelessness / no east-west traffic (no cluster rebuilds if one or more compute nodes are lost)
- Durable data tier leveraging must-have technologies (deduplication, compression, snapshots, erasure coding)
I’m keeping the list short to highlight a minimum baseline of requirements for this would-be emerging infrastructure category, but have to say that keeping in mind Datrium’s expertise in snapshots and deduplication (a sizeable part of their team has been at the spearhead of innovation at Data Domain pre and post EMC era, some having even developing SnapVault during their tenure at NetApp) the set of capabilities and technologies provided by Datrium goes well beyond these few bullet points.
I would love to talk more about what Datrium do, especially with their blanket encryption, as well as with protection and replication policies, but that would require a fully fledged review of its own. And I believe that my dear friend Pietro Piutti has already produced great pieces of work on the matter (see « Further Reading » section below) so you’ll have to excuse me for keeping it short.
Before returning to the main topic of this article I’d like to emphasise a couple things I loved on Datrium DVX. First, the encryption function is software-based. Gone away is the pain of handling SED drives and key management physical appliances: yay for security! Secondly, I liked very much the management of policies applying to multiple VMs, a must-have in our era. Third point, the data protection groups are easy to use and flexible in their implementation, plus they have the much needed auto-membership feature. Finally, the extensions towards public cloud are promising, with AWS support planned for early 2018, and the ability to use cloud archiving already now.
Open convergence: a return to the roots of CI?
One thing that hits the mind when talking about OCI is « here we are again, first we converged it all into a single appliance and now we’re splitting it out ». It’s hard not to try to make a parallel between CI, although in that case CI incorporates also the networking part (including converged ports for FC / FCoE).
Is therefore OCI a disguised return to CI? I wouldn’t say so, because one of the concepts behind CI is a very high level of standardisation, integration and interoperability between components which requires extensive vendor validation (and in frequent cases customers purchasing vendor support to perform complex upgrades). In the case of OCI, it’s quite the opposite with the ability to mix and match a variety of compute nodes – in the case of Datrium these are either Datrium compute nodes or off-the-shelf, built to specification x86 servers, where each node can have as many disparate configurations as needed by the customer operating the infrastructure.
It’s hard to say if open convergence will stick only as a Datrium-related buzzword, or if it is a step forward into the future of what HCI may look like in a couple years from now. The fact that other vendors have also taken this approach might eventually have an impact on how traditional HCI vendors evolve their products and take these changes into account – or not.
Max’s Opinion
If we look at OCI solutions, why do they make sense and why not just go back to the good old world with off-the-shelf x86 servers and a SAN or a NAS? Or why not just go for a nifty, prepackaged HCI or CI solution?
Open converged solutions have inherited HCI attributes such as scalability, ease of deployment, its consumption model (pay-as-you-grow). Convenient packaging (appliances) make it look much easier to adopt when compared to traditional « best-of-breed » or massive, rack/multi-rack scale converged solutions. In that sense, a solution that qualifies for HCI, but still offers the flexibility of scaling storage and compute independently is likely to receive positive echoes from customers when compared with a « build-your-own-rack » solution.
Technologists -whether analysts, nerds, or people on the frontline of IT delivery- love to talk about features and technology architecture. We however have a tendency to forget that IT and infrastructure components are not an end goal but merely a business enabler, albeit a critical one. From an architectural, cost and agility perspective it makes sense for CIOs to look at solutions that provide the best flexibility and growth opportunities, while keeping deployment, expansion and lifecycle costs under control.
I believe that Datrium has an interesting proposition to offer by proposing a new option to customers torn between CI and HCI. This is especially true now that their product can scale farther and satisfy larger customers, by allowing up to 128 compute nodes and up to 10 data nodes per DVX pod. The ability to leverage existing investments is also not to be overlooked and will be appreciated by those looking at repurposing existing assets.
Further Reading
- My friend Pietro Piutti has covered Datrium DVX extensively in an introduction post, a technical deep-dive followed by a technology update covering the Aug-17 improvements.
- Additional articles on Datrium were published by Tech Field Day 14 delegates