Databricks streamlines their data platform by going serverless.

by Austin

Given that the firm is moving its whole data platform to a serverless architecture, one of the criticisms leveled against Databricks over the years—that it is intricate to set up and occasionally challenging to use—will need to be reviewed.

For certain functions, Databricks presently provides a serverless alternative, which relieves customers of the burden of spinning up and down clusters as needed. However, the majority of the platform is dependent on underlying computing clusters, which are paid for by the clients whether or not they use them.

That is evolving. The CEO and co-founder of Databricks, Ali Ghodsi, revealed during his keynote address at the company’s Data + AI Summit on Wednesday that the full Databricks platform will be accessible as serverless beginning on July 1.

Ghodsi stated, “You merely pay for what you use with serverless.” As a matter of fact, there is no cluster to configure whether it is idle or not. Thus, we will handle everything behind the hood for you.

AWS, Azure, and Google Cloud are the three main cloud platforms that Databricks runs on; it depends on them for networking, computing, and storage. Databricks expects customer data to be saved in cloud object storage accounts, such as S3 (Simple Storage Service) on AWS, ALCS (Azure Lake Cloud Storage) on Azure, or GCS (Google Cloud Storage) on GCP. This makes cloud storage relatively simple.

However, configuring the compute is more difficult. Clients can use Databricks to provision computing for tasks like ETL, streaming data, SQL analytics, or ML/AI training; however, they will be charged for the compute through their cloud platform account. A serverless approach modifies the compute equation.

Ghodsi remarked, “All of these knobs that we had before are gone.” “Cluster tuning: clusters are set up by individuals. Which kind of machinery ought they to employ? isolated cases?..Is it OK for us to autoscale? All of that is no longer accessible. It simply vanished. No such page exists. That is not possible for you to accomplish.

According to Ghodsi, becoming serverless benefits clients by eliminating the need to comprehend historical usage and utilize it for capacity planning. (Databricks’ serverless literature mentions that it does not yet charge for network expenses incurred for serverless workloads, but it does reserve the right to do so in the future.)

According to Ghodsi, there are advantages to serverless computing from the standpoints of security and data architecture.

Because we own every system and can effectively shut it down in a different way, we are also able to handle security in a new way. When it is not serverless, that is not feasible,” he declared. “The data layout: how will your data sets be precisely arranged? How will you make your data sets more optimal? That is also no longer there. All we are doing is optimizing in the background. Because it is serverless, we just utilize machine learning to optimize your data set in the background, making it incredibly quick and efficient. That is extremely fantastic as well.

The move away from versioning software releases will be advantageous to Databricks; versioning will be eliminated since Databricks will automatically update the software, providing all users with simultaneous access to the same fixes and features.

According to Ghodsi, Databricks programmers have been working on the serverless version of their platform for the last three years. It was a topic of discussion within the company, but the engineers basically had to redo all of its offerings, which is why it took that long.

“Two to three years ago, Matei Zaharia, the CTO of Databricks, and I informed the company that we needed to create a basic, lift-and-shift version of serverless. Indeed, our engineers retaliated by saying, “Hey, you people are mistaken.” It has to be completely redesigned for the serverless era. And we said, “Nope.” We make decisions as a company. And as it happened, we were mistaken. The technical leads were correct. And over the past two years, they have been putting in a lot of effort to essentially reinvent a lot of the products—notebooks, jobs, everything—as if we had founded a brand-new business.

On June 30, the transition to serverless computing will not happen instantly—even though it is a Sunday, which is great. Making the switch from Spark clusters, Structured Streaming, notebooks, and MosaicAI to the serverless versions of all 12,000 Databricks customers’ products will take time.

To guarantee that serverless versions of its products are accessible in each cloud data center it operates, Databricks is investing globally. Customers will receive significant encouragement from the company to switch to serverless as soon as possible.

Ghodsi stated, “Please start adopting serverless.” “Any new products we release in the future will most likely only be accessible in serverless configurations. Thus, if your company is not using serverless, please do so.