This article has been inspired by my upcoming participating in the Commvault GO Conference in Washington D.C. on the 6th and 7th of November 2017 as a part of the Tech Field Day Exclusive at Commvault GO event. Commvault pride themselves in being a Data Management company, so I’ve been thinking about what data management means to me. I might have got it completely wrong, or outside of context, but here are the topics I think about and struggle with when discussing data management. I hope to get some replies to my questions and concerns while at Commvault GO!
Data management is a discipline that might sound complex, if not boring or extremely tedious. When approaching the storage world from a vendor solution perspective, it’s always very simple to say that product ACME operates in primary storage, or in secondary storage, and so on. However, when looking at things from an organisation’s perspective, the idyllic view of one solution per stack tends to disappear very fast when looking at real-world constraints, especially in global companies.
I recall one exchange with a colleague where we were considering using a certain kind of product for a specific use case. As we compared the scope of that product in regard with what were our expectations, it turns out that we’d had either to split our requirements into many chunks, each managed by a specific technology stack, or try to look at an all-encompassing approach which would cover everything even if not in the exactly « mind blowing tech » way we were first looking at.
We often talk about how technology can help solve problems, and we often tend to look at how a product can solve a solution, but what is really the the problem with data management, and can we solve it at all?
Data Archipelagos and the Global Data Ocean
A decision which is often taken for the sake of easier manageability is to isolate sites / locations / projects from each other. This leads to the creation of many little data islands that I like to call data archipelagos, a lot of scattered but similar little islands with a lot in common but no direct linkage. The isolation, while beneficial from an operational perspective, may have its limits not only in seeing management sprawl/overhead, but also in leading to cumulated inefficiencies from a global data management perspective. On the opposite, what I like to call the Global Data Ocean (another foolishness from my twisted mind, to be put in parallel with data lakes) would be a single continuous and global data space where all of the data is shared and where maximal efficiencies can be attained.
The idyllic vision of a global single space is to be tempered with potential showstoppers such as regulatory requirements, cross-boundary data transfer agreement & data sovereignty treaties, privacy requirements and even internal requirements set by organisations that can be based on data classification imperatives and data retention requirements. These often have a relationship with regulatory compliance, for example Sarbanes-Oxley for document retention. One last point is that certain types of data require not only to satisfy retention requirements, but also auditing of access and protection against tampering.
Further down the line, data management doesn’t stops at files & regulatory documentation. It may well incorporate any data elements that do not qualify for primary data (storage of virtual or physical production workloads) whose goal is data processing. That encompasses nearly all secondary data use cases such as file storage, object storage, data copies (copy of primary data used for dev or test purposes) and finally backup storage with all the implications related to retention policies, backup methods etc.
Data Management: a necessary evil or a business enabler?
From what we’ve developed until now, it seems that a lot of the drivers behind proper data management are external or internal compliance requirements. But beyond those, is there business value to be gained from data management? Data has mass (we could also add that it has gravity, but that is not for today’s topic) and while it is immaterial (or rather, has no physically perceivable substance), it does have very real requirements with a financial impact for organisations.
The cost of implementing storage systems, with the corollary of data protection solutions and data management solutions can be very high if the business leaves free reins and lets everybody do what is deemed to be the best based on the requirements of the day and what teams or LOBs have a preference for. There is therefore a very strong rationale behind putting in place a proper data management strategy that looks across the current practice with an organisation and sets to address the needs in an efficient fashion.
The outcome of this strategy can be a higher level of standardisation across the organisation, via the implementation of a common architecture that leverages one or more solutions. This can lead to better pricing and costs savings due to handling a handful of vendors instead of a plethora and better service via a one-stop support function that can handle most if not all of the architecture and do so at a global scale.
This doesn’t means that organisations must cast in stone forever the processes or technologies used. On the contrary, they must remain vigilant about emerging technologies and build a roadmap & project around this strategy, with clear deliverables. They should also look beyond the usual IT lifecycle and try to foresee the next trends, while incorporating into their current strategy whether data should be stored on-premises, off-premises or if they should go for a hybrid model by taking in consideration not only regulatory imperatives but also cost aspects, especially with colder tiers of data at public cloud providers.
With that said, is data management a real business enabler or is it more a kind of life insurance for businesses? Data is key for organisations but might not always be the quintessential ingredients that leads to the ultimate goal of « delivering maximal value to shareholders(tm) ». The value of data must be put in context with whether it is essential to the business or not. In the most conservative cases, having proper data management in place can be seen as a life insurance: against unplanned issues, disasters, and against the so often unexpected and nonetheless relentless requirements from regulatory bodies.
Addressing the needs of Data Management
The approach should be similar to many other challenges or business cases we see across our work duties. The problem needs to be stated, compliance requirements and scope that are inherent to industries where the organisation operates, but also constraints due to the company legal structure must be well established. In order to achieve scale efficiencies, it makes probably better sense to look at the issue globally, then see where exceptions or grey areas may arise. Locally-led initiatives have the merit of speed but may result in different solutions being adopted, making it even more difficult in the future to take a coherent approach, especially in large enterprises.
Our reliance on technology makes us believe that technology can solve everything, but it cannot ask us yet what we want to do (or tell us what we want to do), and how we want to handle the process part of how data is managed by an organisation: is there a defined standard for document lifecycle? How is the process enforced? Is there a central repository where those documents need to be placed? Shall certain kinds of data be segregated from others? Also, how do we handle copies of data? What do we do with backup copies?
While it’s hard to develop a solution that covers all the requirements, it makes sense to identify patterns and use cases that can lead to the implementation of policies. Those policies should cover and automate data lifecycle in a transparent fashion (or at least as much as possible). Think for example about how objects are stored in Amazon S3 buckets, and how data gets progressively moved to colder tiers. It also makes sense to look at solutions that allow for scale efficiencies, for example by delivering a true global deduplication of objects / files / data.
Finally, some solutions may offer a convenient packaging form factor and offer ease of deployment but may scale rigidly when faced with distributed companies and a broad range of usages, while others may be cumbersome to set up but may offer better results on the long run. Vendors offering solutions that can operate at the software and hardware level may be able to provide a higher versatility when faced with different situations within the same company: site size, local architecture, availability or not of resources to run an hardware appliance vs software-based solution, offloading to centralised storage location based on link speed & connectivity robustness etc.
Max’s Opinion
The topic of data management is a broad one and as you’ve read above, it’s difficult to just focus on a single part of it. It touches to technical and non-technical aspects, covers secondary data use cases, document management, backups, secondary data copies and is heavily influenced by regulatory and compliance requirements. Is this topic relevant for Enterprise IT, or is it a broader, cross-divisional topic?
Reducing data management to an IT discussion would tend to shift the focus from the organisational needs into a technological discussion about why this or that implementation makes sense, a mistake that we often do, and that I can also be tempted to follow. In my view, while retaining very concrete technological aspects and requirements, the discussion should be led by a dedicated team or function that has a very clear understanding of data compliance requirements while also having a good understanding at IT constraints. Ideally, Enterprise IT and that dedicated function should work hand-in-hand to develop a data consumption model that is sustainable, can operate at scale and delivers value while keeping costs under control.
It may not be possible or even desirable to put all our data eggs in one storage basket, but at least we should look at how we want to sort them out.
Disclosure
This post is a part of my SFD14 post series. I am invited to the Storage Field Day 14 event and Commvault GO by Gestalt IT. Gestalt IT will cover expenses related to the events travel, accommodation and food during the Commvault GO and SFD14 event duration. I will cover my own accommodation costs outside of the days when the events take place. I will not receive any compensation for participation in this event, and I am also not obliged to blog or produce any kind of content. Any tweets, blog articles or any other form of content I may produce are the exclusive product of my interest in technology and my will to share information with my peers. I will commit to share only my own point of view and analysis of the products and technologies I will be seeing/listening about during this event.