Address
2nd Floor, 4, Vivek Vihar, Bajaj Nagar, Jaipur, India (302015)
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
In the realm of big data analytics, the battle for supremacy rages on. At MandelBulb Technologies, we embarked on a journey to benchmark two titans: Microsoft Fabric and Azure Databricks. Our mission? To uncover which platform reigns supreme in delivering lightning-fast analytics straight out of the box. Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to Data Science, Real-Time Analytics, and Business Intelligence. Azure Databricks is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale.
We’ve been diving deep into these platforms, fueled by our curiosity and years of experience in the world of data. We were eager to see how they stack up against each other.
Our methodology remained steadfastly unbiased, prioritizing practical performance over tailor-made optimizations. We used a TPC-H dataset 300GB and set up the same hardware for both platforms. We didn’t want fancy tricks to sway our tests. We wanted to keep things real, just like how you’d use them every day.
For any benchmark to be valid, it has to be performed on the same underlying hardware hence it is paramount to match the hardware. Below is the table that depicts the hardware on both the platforms.
Â
We did not use Photon acceleration in Azure Databricks as similar capabilities are not yet ready in Fabric Spark. This way we ensured that the comparison is fair.
Our testing was divided into two critical phases: 300GB TPC-H data conversion to Delta format and the execution of 22 comprehensive queries on the delta tables. This rigorous process, replicated across three distinct trials for each platform, was designed to capture a genuine reflection of real-world usage, inclusive of cluster startup times to mirror actual deployment scenarios.
The results were illuminating. Microsoft Fabric consistently outperformed Azure Databricks in both conversion and query execution times, affirming its superior efficiency and speed. Furthermore, when considering the operational costs associated with each platform, Microsoft Fabric presents an indisputably more cost-effective solution without compromising on computational power. This, combined with Fabric’s suite of tools including Fabric KQL, Warehouses, Semantic Modeling, and OneLake, establishes Fabric as a go-to solution.
From starting the cluster to the conversion of all tables, Databricks took 1137 seconds. In contrast, Microsoft Fabric completed this step in 1059 seconds, making Fabric 78 seconds faster than Databricks.
Moving on to executing 22 queries, Databricks took 1138 seconds for end-to-end execution, while Microsoft Fabric only took 1056 seconds. These results remained consistent across multiple runs, with Microsoft Fabric consistently outperforming Databricks. And the results were consistent when tested on a dataset internal to MandelBulb Technologies.
As previously mentioned, our focus was not just on performance but also on associated costs. Based on the assumption of jobs running 8 hours a day, 5 days a week, totalling 160 hours a month, Databricks in Central US would cost $2672. Even if you pick cost friendly Job Compute, the total cost would be $1872. This is just the Azure Databricks cost. For all the practical purposes you will need tools such as Azure Data Factory for data ingestion and orchestration, storage accounts and other services. In comparison, Fabric F-32 SKU costs $921. It is important to note that Fabric F-32 provides significantly more compute power and services. For an instance, MandelBulb Technologies helped a customer revolutionize the data platform with Microsoft Fabric by utilizing Fabric Spark, Fabric KQL, Data Pipelines and Power BI, running in production with F-32 capacity, serving hundreds of customers parallelly, querying billions of records with average response time under a second. Even if we double the capacity to Fabric F-64 (available as a free trial for all users), the cost would only be $1843, which is approximately 30% less than Databricks.
In practical terms, Microsoft Fabric platform surpasses Databricks in overall value. Fabric Spark’s performance is comparable to Databricks Spark, even when the differences in cluster startup time is discarded which is averaging 195 seconds for Databricks compared to 5 seconds for Fabric Spark.
Disclaimer
It’s important to note that the benchmarking and testing conducted in this analysis were carried out with the intention of mirroring real-world scenarios, akin to how any data engineer might operate. While we’ve strived to maintain objectivity and practicality throughout our methodology, it’s essential to understand that these results are not official benchmarks endorsed by Microsoft Fabric or Databricks. Instead, they represent our practical findings derived from rigorous testing and benchmarking procedures. As such, readers are encouraged to consider these insights as indicative of potential performance outcomes rather than definitive assessments. For further inquiries or detailed information about our testing process or platform details feel free to reach out to us at sales@mandelbulbtech.com
Â