My Personal view on storage Space and why “Geiz is not Geil”
Software defined has been around for a while and is the new Buzzword for all Solution to Problems in a Modern App world. And Also for Storage, Software Defined Storage aims to be the Problem Solver.
But SDS is used in many ways. There are some Sources in the Internet that try to describe software defined storage, but most of them are clearly lacking the definition itself. While SNIA tries to standardize most of the Buzzwords and Acronyms, a good description can be found on Wikipedia:
Software-defined storage (SDS) is a term for computer data storage technologies which separate storage hardware from the software that manages the storage infrastructure. The software enabling a software-defined storage environment provides policy management for feature options such as deduplication, replication, thin provisioning, snapshots and backup.
By definition,[1] SDS software is separate from hardware it is managing. That hardware may or may not have abstraction, pooling, or automation software embedded. This philosophical span has made it difficult to categorize. If it can be used as software on commodity servers with disks, it suggests software such as a file system. If it is software layered over sophisticated large storage arrays, it suggests software such as storage virtualization or storage resource management, categories of products that are very differently positioned.
While here on Teched in Houston, Microsoft Claims their Software Defined Storage ( Definition ) is Storage Spaces and Scale out Fileserver.
Well, to a certain degree, this may or may not be true. But clearly it is only the Hardware Piece Microsoft is trying to Streamline With their Software.
And it is also Software that makes up the Intellectual Property of the Storage Vendors in the Industry . The Storage Industry tries to Optimize the Hardware for their needs with their Software .So in short Term, nothing new here. However, Stuff like Testing, Support and OPEX are most likely not Covered in Microsoft’s Messaging.
Their claim to use commodity Hardware and Low Cost components becomes very soon a Fairy Tale as they restrict the Software to run only on Certified Hardware Components of a specific Type only.The “Freedom of Choice” Commodity Stack requires a lot of Testing upfront and has plenty of Pitfalls. And it is not the Stuff you can buy at Fry’s
Needless to say that the Lowered Capex comes Along with additional OPEX, and that Prior and then Continuously during the Lifecycle. If i would put Critical Data on that, i would definitely run a Test Box for Certifying Patches, Upgrade and new Firmware with my Environment.
And there comes the Next Pitfall: Microsoft recommend prior Installing load the Latest Firmware to All Networking, SAS Controllers and Disks, as well as the Latest Drivers.
Wrong Firmware could lead into Performance Degradation and Data Loss. Do yourself a Favor and read through the Documentation on how to Upgrade SAS Components from Common Vendors Like Dell, FTS or HP. You will find lots of Boot from CD or Offline or DOS stuff there…not the elegant way.
Imagine what happens when you do an Upgrade of the Environment
Being 25 Years in IT and 13 Years in Storage, i know one for sure: there is no free lunch.
To overcome the Problems of Self-Engineering, that,, Microsoft Recommend to Use Pre-Assembled Components from OEM’s. While doing this, they move some of the required Testing of the “Cheep”Components from Customer to the OEM, and so OPEX turns into CAPEX.
During Teched i learned about the obstacles of managing Storage Spaces and Scale out Fileserver. In short Words, there is no single point of Management.
There is no Single Pane of Management for Storage Spaces and SOFS. Management and Monitoring needs to be done everywhere in the Stack. There is no Detailed insight on your “Storage” from a Central Perspective.
This is what we normally Refer to as a Control or Management Pane. It is not there or it is Pieces. Some of it is in System Center Virtual Machine Manager. It allows you to Define Policies and associate Metadata with the Pools and Volumes SCVMM is also there to Provision Storage for VM’s and Hyper-V. As of now, SCVMM is not a Self-service Portal for Storage, witch is Part of the Idea of SDS. Also, API’s allowing application to Request Storage on their needs need to talk to another Pike, SMA or Azure Pack, which is the CLoudOS Portal / Automation.
Other than SCVMM / its web Services, there i no Self Service Portal for Storage. However, a Storage Management API is available where you can write your own software against.
Funny enough to See that Microsoft’s StoreSimple Approach, which i will not dive into now, is at no point connected to that Infrastructure. So how to i tier from a Storage Space to Azure ?
How can i Support Object Store with Spaces On Prem ? I there something like HDFS Support ?
Microsoft has not been silent on Praying the One Million IOPS Thing on teched. In Essence, that was Simple just Maxing out all Components with from Cache to Flash Transactions. In Essence, the Game was just Maxing out The PCI Busses on the Servers.
Perfomance and Impact on Design / Design and impact on Performance
In order to get the Performance your Application Demands, you simply have to know your app and build your “”Space” with the amount of Harddisks that give you the Required IOPS. So in essence, the More IOPS Required, the Mode Drives to put in a Space. Those drivers then are organized in Columns, Making Sure depending on the Protection Scheme, they sit on different Enclosures. If you go wide because your app demands High IO ( ask a DBA if he needs low IO ), you Column Size might become Pretty Large.to if you Start with 16 Drives, once you want to Upgrade your Capacity, you need to add 16 Drives to that Space. Needless to say that this might result in unutilized Capacity.
So the Math is Simple. The more Disks you have in a Space, the Moree Performance you get.It’s a Linear Function on a Linear Worload
But this has also a Downside on Flexibility and Expansion. Because of the Coplumn Scheme, a SPace can only be expanded by the Column Size. And remember: Same firmware here
Also, Performance characteristics of an Application May change during runtime, so you must be elastic an able to absorb those challenges.
The Harder it Gets when it Comes to Component Management . There is no Such Thing as a SCOM Management Pack. Sure, you may get your Enclosure Health State, but the Enclosure might not be aware of the Disk Drive State.
You have to go down to the Cluster Node that servers the Actual Virtual / Physical Disks and look at Their Status
Is that the Future of a modern Datacenter ?
If you do your math on Comparing Storage Spaces with Traditional Arrays, do not underestimate the Costs of Support that are on your OPEX Side. Storage Vendors will take care fore that
Design Principles of Low-cost Storage
When Doing a Design with Low Cost, Commodity Hardware, it is still a Good idea to Use a Hardware Platform that is Purpose Build or Assembled. Remove all the Components that Consume Power. Try to get as Dense as Possible. Rely on Low Cost Interconnects Rather than complex and maybe Expensive Infrastructures,. EMC’s ViPR Block Option for Example uses Single Ported SAS Drives. That Lowers the Power Consumption of the Drives. Small effect of a Single Drive but when it comes to Hundreds or Thousand Drives, it Matters.
On Large Drives, Conventional RAID Methods may not work because of Rebuild Time. Take this into Consideration an maybe Replicate your Storage Transparent and Synchronously. Don’t let your application be Responsible for Data Redundancy. Consume the Storage as a Black Box with Attributes. Let your Storage have an Administration Cluster that can be managed by a Single Individual. Even that software can be independent from Hardware, it is good if Software understands the Underlying Hardware and is able to Manage it
At this point we did not even speak about Design Principles of Flash Arrays and what Happens if they are Utilized at 60 to 80 Percent. Housekeeping might kill your Performance over Time. I will get to this in Part2.
To Sum it Up: Software Defined Storage is not only the Data Pane. The Control / Management Pane is the Most important Part. So i only Agree Partly with the Teched Messaging that SOFS on Spaces is SDS.