Data management and secondary storage are all the rage these days. They have both been covered long & large by the most prominent industry analysts & research organizations. There are a growing number of solutions in this market which sometimes overlap. Both are still a widely debated topic in the industry, and we’re far from having solved the problem.
Data Management: Scope, or Die Trying
It’s hard to dissociate secondary storage from data management: the mess of data sprawl on primary storage is the hand that feeds secondary storage, and secondary storage lives on the promise to make data storage and management easier. Data management itself is a topic so broad that it has links not only with secondary storage, but with many other disciplines; it could be a topic on its own and yet after a whole day of discussions we could still be at the beginning of the topic. It is in fact such an insidious matter that I found myself struggling on this very same topic during most of the 2018 summer season. I wanted to write on secondary storage and data management markets & differentiators. It turned out to become a major annoyance and what we French people call a « serpent de mer ».
What is a « serpent de mer » in plain good old Oxford English, or even in plain good new American English? You’d translate that to a sea snake, or sea serpent. The definition states: « a long-time discussed subject, without substantial reality ». In the case of data management, you could argue that there is a substantial reality. Even, there are multiple realities, all different based on the context and scope. A multiverse of data management realities. And there was my sea-snake paper: stuck between the lines of data management and secondary storage. Needless to say, the draft is still on my document pile, and I decided only recently to split this into two separate topics (ah, the power of taking a shower! Getting water poured on the head sometimes helps).
One of the important discoveries I made during this shower (Eureka time – seriously, shows how some of us humans are perfectible!) was that any discussion about data management without a clearly defined scope can lead nowhere. It was also one of the thoughts that I somehow aired at Storage Field Day 17, when startup Komprise came forth to present their data management solution. What is data management in their context, what is their scope, approach and intent?
Komprise: A Real-World Data Management Story
A broad majority of organizations generating data find themselves unable to understand which data is active and which data can be moved to cheaper storage tiers. This leads to increased costs and data sprawl. Komprise have been on a mission to put an end to this waste by using an intelligent approach to data sprawl on primary NAS storage.
From an architecture perspective, Komprise is a hybrid SaaS (Software-as-a-Service) solution comprising (ha!) of two components: a cloud-based management interface called the “director”, and one or more “observers”. An observer is a virtual machine appliance that scans a customer’s environment in an unobtrusive fashion (no agents are installed on any file servers for example). The director analyzes a customer’s environment, holds the system configuration, and allows to perform reporting activities. For sensitive organizations, Komprise is also able to deliver their director solution in an on-premises format.
Komprise use cases cover:
- Capacity planning
- Live Transparent Archive (in my opinion the most exciting aspect of the product)
- Replication & Data Recovery
- Data Migration
- File Access to Object Data
From my perspective, the most interesting part of the solution is the patent-pending Live Transparent Archive, from which all of the user use cases derive. Live Transparent Archive (and Komprise in general) sees the storage media present in an organization as being either source storage or target storage. Source storage is the expensive storage tier where all of the data is originally present. Target storage represents a cheaper storage tier where the non-active data will be moved after the Komprise solution does its job. Target storage can be a local S3-compatible object store, a tape media library (backed with a cache for faster file retrieval) or cloud storage (S3-compatible too, obviously).
Live Transparent Archive works on the basis of “Komprise Dynamic Links”, or “crumbs”. Let’s imagine a 10 GB file residing on the storage source. Once Komprise determines this file is not actively used, it will move it to a storage target, and will leave a “crumb” (usually cca. 4 KB size) behind on the storage source, which points to the actual 10 GB file that has been moved permanently on the storage target.
When accessing active data, the data path is the same as usual, but when accessing cold data that has been moved by Komprise, the user would access the crumb (not knowing that it is a crumb and not the actual file) and the system would transparently hit the Komprise observer, which would then serve the data from its new location i.e. the target storage.
Data is not moved back to the source storage unless some specific triggers or thresholds configured in the system policies would be met. This avoids situations where heaps of data moved from source to target storage would suddenly be massively moved back to the source target, eventually causing cascading failures. Even in the case when data gets massively recalled (and on purpose), Komprise still analyzes the current usage and will eventually move the data back again to storage target(s) if that data is no longer actively accessed. What is enjoyable is the implementation of a recall policy limit, a useful feature for those who decide to put their target storage in cloud-based object stores where egress traffic may be charged when recalling files.
Because of the primarily hybrid-cloud SaaS character of Komprise a lot of statistics, telemetry, and anonymized metadata is making its way to Komprise. This opens the door for more analytics feedback provided to Komprise’s customers, including perhaps an overview of how they may fare individually compared to aggregated baseline statistics / metrics across all of Komprise’s clients. This feature is not yet implemented (and haven’t heard of any particular plans / timelines) but all of the prerequisites are already in place, so that may be an extra plus for their customers.
The Komprise folks gave us a deep-dive of how the solution works, head over to 17:37 if you are specificially interested in the internal mechanisms of how Komprise works. I recommend you watch the video, because there is so much more that can be said about Komprise than this post can hope to cover.
From a pricing perspective, Komprise is licensed based on capacity, and is available either on a subscription or perpetual license model.
Max’s Opinion
Data Management is almost like pizza (except that it isn’t edible, which makes me sad). Everybody in the pizza business claims not only to make pizza, but also to make the best pizza. It will vary in size, taste and target audience. A group of youngsters will want large pizza with lots of toppings and at a low price. Old farts like me will want the best uncompromising pizza experience. Data Management is similar in that the goal is similar (to manage data) but with very different implementations and outcomes based on what is expected by the customer. In that sense, many companies claim to be in the Data Management business, even if ultimately we are talking about vastly different intents and targets. With Komprise, we have seen a well-scoped use case and definition of what Data Management can be.
Komprise’s solution is very elegant not only on the intent and implementation, but also because it minimally impacts a customer’s infrastructure from a configuration and dependency perspective. Nobody likes to have intrusive solutions that require changes at multiple levels and get into the data path. The versatility of supported target systems gives customers enough choice to make the best decision about where to store their data based on cost and / or internal compliance policies.
With Komprise, organizations are given new possibilities and better outcomes at managing vast amounts of data. Efficiently managing data, and making sure it sits on the right storage tiers means greater efficiencies, a lower TCO and much greater control on storage spending, with more time to breathe between capacity expansion cycles. As usual, customers have to assess a product viability in their environment, but it seems that Komprise have a pretty awesome solution for those with hundreds of terabytes and up of files.
I enjoyed learning about Komprise and hope to hear about them very soon again! By the way, we will soon record a TECHunplugged Podcast episode with the Komprise team, so follow that space and make sure you subscribe to the podcast RSS (Apple Podcast | Other options)! By the way, we’re also on Spotify!
Disclosure
This post is a part of my Storage Field Day 17 post series. I am invited to the event by Gestalt IT. Gestalt IT will cover expenses related to the events travel, accommodation and food during the event duration. I will not receive any compensation for participation in this event, and I am also not obliged to blog or produce any kind of content. Any tweets, blog articles or any other form of content I may produce are the exclusive product of my interest in technology and my will to share information with my peers. I will commit to share only my own point of view and analysis of the products and technologies I will be seeing/listening about during this event.