Dremio Reveals How Metadata Operations Help Organizations Achieve a True Data-as-Code Paradigm
Developers are a crucial element in the production and maintenance of digital products. Whenever they can easily access, manipulate and be creative with the resources they need, that’s where innovation happens.
Data as code is a paradigm that seeks to push the boundaries for developers, as well as the tech industry as a whole. The company has reached a DevOps stage where developers pull datasets from production, turn them into code, and then test them for effectiveness and compatibility. Basically, the data itself is programmed, which represents the ideal concept of data as code, according to Mark Lyons (pictured), vice president of product management at Dremio Corp.
“You have to do this throughout metadata operations so you can control the data version that the the individual is working with and what version of the data production systems see, because these datasets are too large,” he said.
Lyons spoke with theCUBE industry analyst John Furrier during the AWS Startup Showcase: Event “Data as Code — The Future of Enterprise Data and Analytics”, an exclusive broadcast on theCUBE, SiliconANGLE Media’s live streaming studio. They explained how metadata operations and platforms like Dremio help companies manage huge datasets. (*Disclosure below.)
Expand and accelerate the use of data
With the data playing field larger and easier to access, data engineers, scientists, analysts, and even end consumers have a lot to gain. Along with increased productivity, there is now room for more groundbreaking experimentation and the proliferation of use cases.
With the elimination of the need to create entirely new data pipelines, define new schemas, add columns/data types, etc., engineers, developers, and organizations looking to be data-driven data can do much more with data at a much faster rate, according to Lyon. Risk reduction operations are another benefit of manipulating data as code, he added.
“YYou are not afraid to mess up the production system, mess up this data, show it to the end user. with sSome companies, data is their business … go until the end one consumer, one third party,” Lyons said.
Over time and with the increasing complexity of computing, many mundane tasks – like iterating AI and machine learning algorithms – have become downright laborious.
“I think it’s going to change the world, because this stuff was so painful to do. The datasets had gotten so much bigger, as you know, but we still did it the old way, which was generally moving data for everyone,” Lyons explained. “It involved copying data, sampling data, and moving data. »
The old paradigm is proving more and more inefficient by the day, and data as code is meant to fix it.
Data lakes complement, not oppose, change
When cloud data lakes emerged, distributed file systems like Hadoop and Snowflake were the mainstay. A few industry analysts have even predicted that these data lakes may not reach their current level of popularity. The technology has proven invaluable, and now data as code is poised to add even more value to cloud data lakes by overcoming some of their pressing shortcomings, according to Lyons.
“Data lakes this time with the Apache Iceberg table format and what Dremio is working on around metadata, these things are not going to become data swamps anymore,” he explained. “They will in fact be functional systems that inserts and updates in leads. You can see all commits. You can time travel them. And all files are effectively managed and optimized so you need to partition the data.
With a good grasp of manifest files, changes that occur in files, query engines, and other validations, developers are in a better position to create a working system that is not just a “data swamp”, according to Lyons.
There’s a general demand in the industry for business intelligence tools capable enough to handle the tsunami of data sources businesses face, Lyons added.
“Fon the data source side, Dremio is very competent with our parquet files in an object store, as we have just said, but it can also access data from other relational systems,” he said.
Stay tuned for the full video interview, part of SiliconANGLE and theCUBE’s coverage of the AWS Startup Showcase: Event “Data as Code — The Future of Enterprise Data and Analytics”.
(*Disclosure: TheCUBE is a paid media partner for the AWS Startup Showcase: “Data as Code — The Future of Enterprise Data and Analytics” Event. Neither Dremio Corp., the sponsor of theCUBE’s event coverage, nor the other sponsors have editorial control over the content of theCUBE or SiliconANGLE.)