Latest Stories
M1 Max MacBook Pro and External Display: When Productivity Becomes Frustration Social Media Limbo kamshin.com on hiatus Some thoughts on the new MacBook Pro Opportunities And Challenges With Personal Health Data – Looking at Garmin Data

kamshin

  • Home
  • All Posts
  • Categories
    • General
    • Tech Field Day
    • Storage
    • Nutanix
    • Certifications
    • Conferences
    • Worth reading
    • Design
    • Rants
    • Active Directory
  • Media & Press
  • Podcast
  • About me
    • About me
    • Where is Max?
    • Disclosure & Policies

Hedvig Deep Dive – Is software-defined the future of storage?

June 2, 2016

FacebookTwitter
This post is part of the blog series related to Storage Field Day 10. Find out the entire SFD10 content, presentations, articles, presenting companies and delegates here.
Hedvig presented on 27th May 2016 during Storage Field Day 10 from their headquarters in the Silicon Valley. The discussion revolved mainly about Hedvig’s proposition and a technical deep dive of the product, its architecture and capabilities. Hedvig is a Software-defined Storage (SDS) distributed platform that takes its roots in the experience of the founder and CEO, Avinash Lakshman, who wrote Cassandra while at Facebook and is also the co-inventor of Dynamo.
If you want to read a more general presentation about Hedvig, please refer to my previous blog post.
Rob Whiteley gave us a high level overview of of Hedvig works (see the picture below) and then we had a whiteboard session with Bharat Naik (video link at the end of this page). I wanted to offer the readers with a view of the Hedvig Architecture, because I’m very interested in how they implement their software-defined storage architecture. You might note that the coverage of this session may not be all-encompassing, I tried to focus on what interested may and I may have missed some parts, you are therefore encouraged to make an opinion for yourself by looking at the videos (those prefixed with SFD10).
High-level view of how Hedvig works before the deep dive session

Architecture Deep Dive

Supported Hardware
Hedvig doesn’t have an Hardware Compatibility List, they have however a set of “recommended configurations” available on demand, plus a reference architecture based on Cisco Hardware. This article is also interesting and provides sizing recommendations / info about building a Hedvig cluster on commodity x86 hardware.

«  The quality of servers determine IOPS, the quantity of servers determines availability » Avinash Lakshman, Hedvig CEO – on Hedvig’s availability & performance

Hedvig Storage Proxy
A component that resides on the compute plane. The Hedvig Proxy masquerades as an NFS share or iSCSI target and presents storage to the VM. The Proxy talks with the backend through programmable APIs. Hedvig Proxy is aware about data placement location and can track nodes based on latency and previous requests. It is also stateless, data is cached on the SSD, upon restart it will become fresh again and will begin caching data once again. The proxies work in an HA pair, if one proxy goes down, the surviving proxy will begin servicing requests that were originally sent to the dead proxy.
Hedvig Storage Services
A distributed service that runs on storage nodes. Two processes run on each storage node:
  • HBLOCK Process (Data) : Takes care of replication and locally storing data
  • Pages Process (Metadata): knows which nodes are up or down and what is their state (usage spikes etc..), knows also which blocks were successfully replicated to which nodes.
Caching
  • Metacache (metadata) – needed at all times.
  • Block cache (data) – can be turned off. Either in DRAM or SSD and present at the proxy level. Customers with NVMe at the proxy level have seen 90000 IOPS from cache
  • DedupCache: global deduplication cache: caches the fact that a block ID with the same hash is already there and only updates the map page
  • Hashing is done at the proxy level, the hash is sent to the Pages Process and if there is a match between the hash sent by the proxy and what is already in the Metadata, only the pointer to the hash is updated
  • Hash IDs are unique to avoid potential hash collision/reuse
Virtual Disk
The primary abstraction provided by the backend (storage) is the virtual disk. An Hedvig virtual disk (vDisk) has the following attributes:
  • Replication Factor (1 to 6, 3=default), Quorum for replication: (RF/2)+1. The writes are performed synchronously to quorum drives then the remaining writes are done asynchronously. RF is fixed and cannot be changed (unless you do a clone or snapshot).
  • Replication Policies (Agnostic, Rack Aware, DC Aware)
  • Deduplication, Compression, Client caching (caches data blocks at the proxy level – read cache only)
  • Block granularity is set at the vDisk level.
Virtual disks are chunked into containers (each container has a size of 16 GB) and these containers are then split across multiple nodes.
E.g. 1 virtual disk of 1 TB size with RF=3, agnostic replication policy, on a 10 node cluster:
  • Container 1=node 1,2,3
  • Container 2= node 4,5,6
  • Container 3=node 7,8,9
Another construct are Storage pools which are a logical grouping of 3 disks per node that are able to hold  containers.
Write sequence
The Proxy sends a request to the Storage component, the HBLOCK process takes care of sending data write requests based on the policies above, the write is acknowledged and sent back to the proxy after 2 nodes have confirmed the write operation or the quorum for writes has been reached. While this adds a potential delay, this ensures the data is effectively written and this comes as one of the tradeoffs to count with when using a distributed platform. Hedvig recommends that in case of DC aware replication policy, at least two of the nodes should be local to each other to ensure the write operation isn’t affected by any delay. Also, Hedvig doesn’t claims to achieve sub-millisecond operations.
The Architecture Whiteboard session video is available here:

Hedvig Architecture Whiteboard with Bharat Naik from Stephen Foskett on Vimeo.

 

Statistics pot-pourri

Hedvig stated that currently they have a customer base of 100 users. They seem to already have a large deployment in place and although they can work in hyper-scale or hyper-converged mode, hyper-converged seems to have the preference. Currently, 70% of their installed base runs on top of VMware. While Hedvig is not an officially certified VMware solution, one may ponder the advantages vs inconvenients of this, as they claim they will support any deployment on top of VMware. Certification programs aren’t free and it may be understandable that Hedvig prefers to focus money on product development. Licensing is available on a perpetual basis or on a yearly subscription basis. The licensing unit is per TB (or was that TiB?) of nominal capacity if I remember well.

Positioning and future?

Considering that Hedvig went out of stealth about one year ago it seems that they have a pretty solid SDS offering in place. There are however some additional challenges that will need to be addressed. Currently in version 1.0, the graphical user interface could see some improvement – priority seems to have been given to APIs. To Hedvig excuse one can understand that they heavily focused their efforts on having a robust backend. There were concerns during the session about security around the Proxy component, Hedvig remained, in my opinion, vague about these topics. While it’s true that it’s hard to make sense of the data, the potential of nuisance of an ill-intentioned individual shouldn’t be underestimated. I wonder if Hedvig have mitigations in place such as using separate VLANs for the Proxy-Storage Services communication and restricting traffic through either firewall rules or ACLs.

While Primary Data’s goal is to rein in existing storage technologies and enterprise chaos by applying a conquer & divide approach (through overlay and policy), Hedvig have a totally different approach. By leveraging commodity x86 hardware (and commodity doesn’t necessarily means cheap, but readily available) and providing a fully fledged software approach to the storage problem while supporting any kind of deployments possible (hyper-scale or hyper-converged) on any available platform (Hyper-V, VMware, KVM, containers and even bare metal), Hedvig probably aims to become the de-facto standard in Software-defined Storage. Nothing comes for free (you have to pay a license) but certain customers can expect cost savings by avoiding custom-tailored appliances upon which vendors generally charge a premium.

Final Thoughts

The Hedvig presentation and deep dive matched my high expectations and it was one of the best sessions IMHO during SFD10. Although not everyone will “give a damn” if I can say so about the innards and architecture of a storage solution, we got “buck for the bang” by having a detailed technical session with a lot of delegate involvement as well.

Several veteran delegates shared that this session reminded them of Nutanix and PernixData sessions in 2012. It was certainly an exciting session and while perfectible on certain aspects, Hedvig has a huge potential to become a successful company with a great product.

 

SFD10 Disclosure: this post is a part of my SFD10 post series. I am invited to the Storage Field Day 10 event by Gestalt IT. Gestalt IT will cover travel, accommodation and food during the SFD10 event duration. I will not receive any compensation for participation in this event, and I am also not obliged to blog or produce any kind of content. Any tweets, blog articles or any other form of content I may produce are the exclusive product of my interest in technology and my will to share information with my peers. I will commit to share only my own point of view and analysis of the products and technologies I will be seeing/listening about during this event.

Hedvig Disclosure: I received a Hedvig laptop sticker + a Rogue bluetooth speaker with the Hedvig logo. These gifts did not influence this post.

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on Reddit (Opens in new window)

Related

A note to our readers

kamshin.com has a strict no advertisement policy. If you enjoy this website, please consider making a donation to one of these non-profit organizations that I personally support:


People in Need - Czech Republic

A Czech-based non-governmental, non-profit organization founded on the ideals of humanism, freedom, equality and solidarity, helping people in the Czech Republic and in the entire world.

People In Need Logo

Greenpeace

Hopefully this one doesn't requires any explanation. Act for our planet. Act now.

Greenpeace Logo

826 National

US-based charity. An international proof point for writing as a tool for young people to ignite and channel their creativity, explore identity, advocate for themselves and their community, and achieve academic and professional success.

826 National Logo

 


Electronic Frontier Foundation

The leading nonprofit defending digital privacy, free speech, and innovation.

EFF Logo

 


Thank you!

RSS Latest Podcast Episodes

  • EP 30 -Rose Ross Chief Tech Trailblazer on the Tech Trailblazer awards
  • EP29 – Imagine the possibilities to manage your data with Data Dynamics StorageX – with Piyush Mehta
  • EP28 – Introducing Clumio, A Cloud-Based Data Platform Launching With Data Protection As A Service – with Poojan Kumar
  • EP27 – VAST Data – A Revolutionary Storage Platform For The Next Decade – with Howard Marks

Categories

  • Active Directory (5)
  • Certifications (8)
  • Conferences (22)
  • Design (1)
  • Featured (1)
  • General (89)
  • Nutanix (4)
  • Rants (2)
  • Storage (38)
  • Tech Field Day (44)
  • Worth reading (4)

Latest Tweets

My Tweets

Popular posts this week

  • Using Virtual Machine custom attributes with PowerCLI for snapshotting
  • Rubrik - A Refreshing Approach to Backups
  • Pure Storage's FlashBlade - Against The Grain
  • Scality RING - Object Storage? Yes, but Software-Defined please!
  • Intel SPDK - A foundation block for new generation storage

Categories

  • Active Directory
  • Certifications
  • Conferences
  • Design
  • Featured
  • General
  • Nutanix
  • Rants
  • Storage
  • Tech Field Day
  • Worth reading

Pages

  • Blog
  • Disclosure & Policies
  • Home
  • Media & Press
  • VCAP5-DCD Resources
  • VCP5 Certification Resources
  • About me

Archives

Copyright ©2016 kamshin

 

Loading Comments...