Very powerful yet easy to use distributed computing and data warehousing platform
What do you like best?
Databricks had very powerful distributed computing built in with easy to deploy optimized clusters for spark computations. The notebooks with MLFlow integration makes it easy to use for Analytics and Data Science team yet the underlying APIs and CICD integrations make it very customizable for the Data Engineers to create complex automated data pipelines. Ability to store and query and manipulate massive Spark SQL tables with ACID in Delta Lake makes big data easily accessible to all in the organization.
What do you dislike?
It lacks built in data backup features and ability to restrict data access to specific users. So if anyone accidentally deletes data from Delta Table or DBFS, the lost data cannot be retrieved unless we setup our own customized backup solution.
What problems are you solving with the product? What benefits have you realized?
I have worked with big data with hundreds of millions of rows using databricks. We do most of the ELT, data cleaning and prepping works on databricks. The ease and speed of querying bid data using databricks SparkSQL is very useful. It is also very easy to create prototype codes utilizing real sized data using the available Python and R notebooks.