I recently had the immense (and undue) honour to discuss as one of the panelists at the Intel Storage Builders event in Barcelona on the theme of “Evolving Storage Platforms Transforming the Cloud and Enterprise”. The discussion was moderated by Tech Field Day organiser and self-proclaimed storage nerd Stephen Foskett. The panel consisted of Alastair Cooke, Pietro Piutti and last but not least, Intel’s own Michael Mesnier. I had the opportunity to listen to Michael Mesnier present less than two weeks ago (early october 2016) at Tech Field Day from Intel headquarters in Santa Clara, so I certainly didn’t expect to sit next to him and have to discuss about such a wide topic.
Note that while the reflection below is mine, it incorporates ideas and concepts expressed by other panelists at Intel Storage Builders.
Next Generation Storage Characteristics
Based on the discussion and my own opinions, new storage solutions should offer:
- Automated Data Classification & placement
- Unified Management Protocol
- Storage Capability Detection
- Automated Array Management
Automated Data Classification
A new generation storage system that is ingesting data should be theoretically able to identify the data type that is being received and ideally should be able to classify it and direct it to the proper pool and type of storage. This is being simplified but think of a storage system that would recognise pictures being sent to it, would process the metadata and qualify the data to be stored on the object storage tier. On the other hand, structured data (e.g. database creation) would be stored on the performance tier.
The idea behind this is that from a storage capacity consumer perspective, the only thing that would be seen is one or more continuous name space where data would be stored seamlessly for the consumer, while on the backend software intelligence built into the storage solution would work out in real time (or near-real time) where the data needs to be placed based on its attributes.
This would imply at first stages a policy-based system where the configuration would be done manually, but nothing excludes yet another generation with machine learning embedded, where software and AIs would learn how data is written, how it is read, and based on its size and importance data would be classified automatically: frequency of access, type of data, applications using the data etc. but also other factors such as availability, reliability and durability.
Unified Management Protocol
This is the utopia of a universal language between all storage systems that would allow all of them to be managed seamlessly into a real single pane of glass system. The utopia may soon become real (to a certain extent) with SNIA’s Swordfish. My friend Chris M Evans goes to great lengths to analyse Swordfish as well as giving a brief history of previous attempts to get to an usable standard. The inability for IT to manage storage systems centrally, under an unified view has been a long standing pain. It’s hard to say if Swordfish will succeed where other specifications failed, yet we have no other choice but to give Swordfish the benefit of the doubt, since there are currently no alternatives.
Storage Capability Detection
This attribute would determine how a storage array or solution presents its capabilities to the upper layer that will leverage the storage. Beyond the essential configuration of the communication interfaces (nowadays mainly iSCSI, NFS or FC) there should be a way for the storage array or solution to present their capabilities to an upper layer. This upper layers remains yet to be defined. Nowadays we can consider an x86 hypervisor (such as VMware vSphere) as this upper layer and the hypervisor is able to detect whether it is working with an SSD tier or a capacity, spindle-based tier.
In the future, storage management AIs could become the interface between the storage layer and the compute layer (virtualisation, containers, etc.).
We’ve had analytics built into storage solutions for some time already, but these have only been providing recommendations upon which users can take decisions. Michael Mesnier implied that analysis results that would take a human one or or several weeks could be achieved in hours or days, with similar outcomes, by machine-based analytics. Leveraging neural networks would make such analysis even easier. However this would imply the difficulty for a human to understand the reasoning done by the neural network.
Beyond that point, Stephen Foskett pointed out the moral aspects of fully giving control to an AI without letting an human jump in and take back control. The analogy with self-driving cars he made was striking. We do have self-driving cars but these have manual controls allowing for fallback in certain situations. I certainly can’t imagine the terror of sitting in a fully automated car. Yet we may be there in a few years. Maybe not in 5, but what about 10 or 20 years? And it’s hard for us as humans to just let go and trust a machine to do “our job”.
There is also the argument as whether a decision taken by an AI is the same as one taken by a human. But is it relevant that the AI uses the same reasoning as a human, as long as it reaches to the optimum state of placement? And also, what would such a “storage management” AI look like?
We could envision an AI that connects and controls all of the storage components in an environment. That AI would first only recognise the characteristics of every given storage device/solution. Then it would use these as parameters to make the appropriate decisions based on data classification imperatives, eventual policies/inputs given as instructions to the AI, and also the load on a system, peaks etc. would be permanently analysed by the AI to make placement/rebalancing decisions, move data between tiers (taking in consideration cloud-based tiers, long term storage etc). The ultimate goal of the AI being to reach a sort of nirvāṇa state where an entire environment is optimally balanced.
We are living fantastic times and we can hardly fathom what innovation in software and AI might bring to the enterprise and storage world. Looking back at this article I wrote 6 months ago about whether storage administrators have a future, I am more and more convinced that this profession (as it currently exists) is being rendered irrelevant. If not now, a few years down the line it will be.
In my own humble opinion, I believe that at some point in time there should be a decoupling between the storage system (the systems that delivers specific capabilities in terms of performance, deduplication, compression, availability, reliability and durability), and the AI-based management system, which will be a higher layer that sits above and manages the storage system stack.
Will storage administrators and systems administrators become AI herders? It is difficult to look into the palantír and see a clear vision of the future, however we may eventually reach to a point where humanity is doomed and where our pagers will stop beeping at 3AM because someone, somewhere, for some reason, disabled DRS on a given cluster or enabled some stupid affinity / anti-affinity rules.
I have paid my own travel & hosting costs to attend Intel Storage Builders as well as VMworld Europe 2016. I was provided a VMworld blogger pass by VMware, however the Intel Storage Builders event is unrelated to VMworld 2016. I have not been compensated in any way and my participation was purely of a gracious nature.