I’ve been curious for a long time about Rubrik and what they do in the backup space, however I really never had a chance to look at Rubrik closer until recently. I had the opportunity to steal some time from Rubrik and get a deep-dive session at VMworld 2016 with Chris Gurley, who happens to also blog at thegurleyman.com. I’ve been asking Chris a ton of questions and I’d like to thank Chris here for his patience.
I took quite some time to try to structure my understanding and thoughts around Rubrik. Let’s go for another post that is as long as a speech from Fidel Castro!
Rubrik is a startup based in the Silicon Valley which has emerged from the “Nutanix Generation”, i.e. from the mindset of various engineers and investors that are closely involved into Nutanix, Rubrik and Cohesity. I am not a Silicon Valley expert neither I am a VC / startup pundit so you may have to check the facts. In any case Rubrik is led by CEO Bipul Sinha, who is also known to have close ties and investments in Nutanix.
According to Crunchbase, Rubrik has levied so far 112 million USD in three rounds from three investors. Their latest round (C Series) was in August 2016 with a total of 61 million USD raised. In a market estimated by Rubrik’s own website at 48 billion USD, there is quite an opportunity to grasp for whoever can effectively disrupt currently existing solutions – of which most are archaic remnants of a past long gone.
Rubrik is often mentioned to be the “next Nutanix” and is said to be the “next billion dollar startup” if we listen to Forbes. Since there is so much hype around Rubrik, let’s try to understand what Rubrik does and what is the potential of their solution.
A novel approach
Traditional backups in large enterprise environments – let me emphasize that in bold and caps lock – ABSOLUTELY SUCK. They are a pain for users, a pain for virtualisation administrators and they are -no apologies- an effing pain in terms of infrastructure congestion/overhead. Whether snapshot-based or agent-based, those systems have issues operating at scale. And finally, let’s think about our friends the backup admins. Do you think they are happy at the end of the day? I think not.
Rubrik has taken a novel approach to backups, backed by a logical rationale. It is driven by the very pleasant premise that backups should be simple, transparent, efficient and easy to set up.
Rubrik is a solution delivered in an appliance form factor, named a “Rubrik brick”. Several models are available, the entry point being the r334. Scaling is easy, you can add nodes (for example upgrade your r334 to an r344) or just add as many bricks as needed, and everything follows seamlessly. A variant for high-security requirements, the r528, offers FIPS 140-2 Level 2 self-encrypting drives.
For the customer, Rubrik presents a very similar interface to Nutanix Prism. It is simple and straightforward and requires little experience to get used to it. Rubrik works with “SLA Domains”. These SLA Domains can be understood as retention rules, are fully customisable and easy to understand: customers can select the required snapshot frequency and retention period by adjusting sliders. Rubrik is Long-Term Retention ready and offers to offload older backups to public cloud storage services such as Amazon S3, Azure Blobstore or Amazon Glacier. This offloading allows to free up space on the appliances and store data at very minimal costs.
It is possible to apply these at various granularity levels to a variety of objects: individual VMs, clusters, folders. Because potentially each object can have its own SLA, this allows for jobs to run disparately, all day long and at different intervals, as specified in the SLA Domains.
From an architectural perspective, Rubrik could be compared to an “hyper-converged solution for backups”. Running on top of a commodity hardware platform brick (a 2U rackmount form factor chassis with four node slots, made of at least three nodes / brick), the intelligent software fabric is a distributed software cluster in a shared nothing architecture without any single points of failure, which remembers the Nutanix architecture from a high level standpoint.
VMware VADP is leveraged to ingest backups into Rubrik through CBT (Changed Block Tracking). Fast SSD drives are used to accelerate the process, with up to 1.2 GiB/sec on a single 4-node brick according to founder Bipul Sinha. The ingested data hits the metadata layer in SSD where it is deduped before being stored as a background activity on spindles. Other background activities include the rebalancing of data between nodes when new nodes/bricks are added. This is however a low priority activity, as the priority is obviously given to backup activities.
The deduplication and compression ratios achieved by Rubrik systems typically range in the 7.5:1 ratio, with cases in the field going as high as 9:1. To achieve all of these goals, Rubrik created the Atlas File System.
As can be seen in the graph above (screencap from Rubrik presentation at Tech Field Day 10), this file system uses triple data mirroring, intelligent striping across disks as well as a single global space for dedupe and compression, thus increasing the efficiency of these features as the cluster scales). In case of VM restore, the Atlas File System is presented to the ESXi host(s) as an NFS target.
Rubrik also have their own distributed scheduler, where nodes independently determine which tasks needs to be executed and stagger them if needed to avoid bottlenecks or backup storms, in a stark contrast with the “batch job” approach of many traditional systems.
So far my exposure to Rubrik from a technical standpoint has been limited and I hope to follow their live streamed session at Tech Field Day 12 to discover more about the innards and how data is effectively handled.
Beyond backups: Rubrik & DR
A core feature of Rubrik is the ability to operate in various replication topologies, which means that it is possible to perform DR of your backups to a distant site.
The hyper-converged architecture of Rubrik leverages SSD drives and allows to live mount a datastore into ESXi to recover one or more VMs. The VMs can be either kept powered off and cold migrated or else they can be instantly powered on (within cca 60 seconds) and served to the live environment through the SSD tier, then moved when needed to the primary storage through Storage vMotion.
A customer may also have a use case where for example they have their production environment coupled with a Rubrik appliance at a primary site, and only one or few hosts with Rubrik appliances (for DR purposes) at a remote site. In case of primary site loss or issues, it is possible to leverage the remote site for DR purposes during the outage. A Rubrik brick should be potent enough to support production workloads in DR scenarios with an approx 100k IOPS/appliance (30k IOPS/node). Hosts are needed to mount the storage as NFS. In any case, it is recommended to engage Rubrik to determine use cases and scenarios.
Finally, it’s worth to mention here that Rubrik is fully REST API driven (making integration very easy) and also has a great search engine, which allows to search not only for files inside backups but also for content within the files. Which reminds me that VMDK files are indexed and that FLR (File Level Restore) is of course available, even for data stored on the cloud tiers. The latter is truly an advantage as it’s way cheaper to recover one or more individual files from S3 or Glacier, for example, than full VMDKs.
Since simplicity is at the core of Rubrik motto & strategy, the selling model adopted by Rubrik is similar to Nutanix strategy: sell commodity hardware appliances with embedded software intelligence. The customer pays for the physical Rubrik Brick, which comes with a defined number of nodes and a given raw storage capacity, since dedupe and compress potential savings depend on the type of data. This inscribes Rubrik into a CAPEX model for customers, as capacity is purchased upfront. We also heard that Rubrik have a virtual appliance model (Rubrik Edge) intended for ROBO sites which leverages a license model with a yearly maintenance fee.
To me, Rubrik is clearly aiming at large, fully virtualized enterprise environments. It’s where complexity (many sites, many VMs, many schedules) is suffocating customers and it provides an entry point which can allow to sell more & more bricks. The friction point versus software only backup solutions is the entry cost of purchasing a bundled software + hardware solution in an appliance form factor, compared to a software only license and the use of “cheap” entry-class commodity hardware as backup targets. Obviously, customers should compare on a feature basis.
At this point in time, only virtual environments running VMware ESXi are supported. Support for Hyper-V and KVM (presumably also AHV) is announced for “later this year”. The question remains whether Rubrik may consider backing up physical systems as well (currently, backup of physical Microsoft SQL Servers and Linux Phyiscal servers is possible according to this Tech Field Day 12 Primer from TFD peer Matt Crape).
On a side note, because of the commodity nature of the hardware, it wouldn’t be unlikely to see very large customers reach out to Rubrik & have them certify their own hardware, or eventually consider an ELA-like licensing model.
Rubrik is well-positioned to support today fully virtualised infrastructures in medium to large environments. Despite the competition in the field they have a distinctive competitive advantage with a simple and easy to deploy product. Their most direct competitor is Cohesity, a platform which is however tackling the problem in a different way and is not “just” a backup solution. I hope to write soon about Cohesity, by the way. Other competitors are the regular backup vendors to which we are accustomed to hear, and software-based giants such as Veeam, who are likely to have it easier with smaller customers and SMBs.
Rubrik must count with one fierce enemy that will not surrender arms so easily. However, this unlikely enemy is not a competitor. It is the legions of backup administrators who, like their storage administrator peers, are being rendered completely irrelevant by disruptors like Rubrik. While it is certainly a sad thing and a tragedy in the years to come for these individuals, you may allow me to rejoice without any shadenfreude.
Count on legacy storage vendors that operate in large enterprise environments to team with distraught backup administrators and entrench themselves in a long but hopeless resistance, as new contenders like Rubrik face the shrapnel of legacy vendors FUD. Count on the other hand on virtualization administrators to take things in hand and, like they did with storage, flip the bird to their backup counterparts.
On a final note here’s my friendly advice if you are a backup administrator: have a look at my post on storage administrators for career advice. You are also becoming an endangered species.
Rubrik provided me with 1 laptop sticker, a box of mints as well as three lego minifigures at VMworld. I am not affiliated to Rubrik, neither did they ask for me to write an article. Due to my frequent participations at Tech Field Day events, we could also loosely related this post to Tech Field Day 12 although I am not able to participate this time.