Big Data VS Traditional RDBMS
Introduction
With all the buzzwords around the internet, often we feel overwhelmed and forced to question if big data systems will serve as the replacement for traditional RDBMS systems. In this article, I will tackle this question and explore both sides of the picture.
About The Author
My name is Muhammad Osama. I am a data analyst and have been associated with the FMCG sector for the last 2 years. Now-a-days, I am providing world wide consultancy to different companies as a freelancer. I enjoy teaching and learning about data. If you have any questions, feel free to reach out at Muhammad.Osama@CyberCode.ca
What is a RDBMS?
RDBMS stands for Relational Database Management Systems. These databases have been around for a long time with innovations spanning a period of 40 years. The data is stored in the form of tables which are similar to a spreadsheet, making the task of understanding them very easy and these tables are linked together with relationships.
What is Big Data?
By Big Data, we mean the following
- Unstructured Data
- Data Variety
- Rapidly Generated Data
- Cheap Cost
Let’s explore the concepts one by one.
Unstructured Data
Meaning a lack of structure, when compared to the RDBMS, all the table’s columns have known data type and data in a column is guaranteed to be of the same data type. This is not the case when you think of a CSV file where there is no restriction to stop you from entering any data in a column of the CSV.
The CSV is an example of a semi-structured data type.
Data Variety
Data is varied, take the example of video being recorded by a CCTV. It is not possible to do analysis on the binary data therefore specialized databases are used for this purpose.
The way they work is that they extract features from the images and then put the data into a database and then data is pulled from the database to analyze.
Rapidly Generated Data
Unlike traditional systems such as an ERP, data from the sensors is generated rapidly, often GBs of data in mere minutes. Therefore, traditional databases are not equipped to handle this amount of data.
Cost
When compared to traditional RDBMS, the cost of per GB is storage is much less in non-relational databases when compared to big data systems.
Comparison between RDMS and Big Data
By now, you will have developed some idea of how complex big data databases are. But the big data database lack ACID properties of the traditional RDBMS.
So what exactly is ACID, why does it make traditional RDBMS relevant for the times to come and why are big data databases not a replacement for RDBMS?
ACID, stands for the following:
Atomic
Meaning, each transaction(Unit of work in the database) is indivisible meaning that it’s possible to divide a transaction into parts. It will execute completely or not execute at all.
For example
Transaction 1 comprises of the following
Statement 1
Statement 2
Statement 3
Statement 4
Suppose that one statement fails then it’s not possible for the other statements to execute. And the database will roll all the changes back.
Consistency
The data will remain consistent meaning that all the constraints (These properties on the database columns that allow tables to allow certain values in the table and prevent certain values from entering the tables) and triggers (These are certain actions designed) will remain intact.
Isolation
All the transaction can be thought of individual process that are put into a pipeline, so transaction can read from another transaction until that one is complete.
Durability
These databases are fault tolerant, suppose that if the power was to go out, then the database has the ability to recover from the stage where that particular failure has occurred.
Which one to choose?
It all depends upon your need and scenario. If you don’t want ACID based transaction, than any Hadoop, MongoDB or Big Query can serve as very good alternatives.
I hope now you have very good understanding of the differences between the two.
If you are interested in exploring the world of RDBMS with SQL Server then check out our Data analytics course with T-SQL.
Tag:big data, data analytics, rdbms