In my previous post, we have seen some fundamental differences between Relational and Non-Relational databases. In this post, let’s talk about Scalability of these two.
It is an ability of a system that can easily accomadate the rapid incoming data without much performance problems. This is a main factor for any system to provide good scalability. There are two types of scaling methods known as Vertical and Horozontal scaling.
All the Relational database tools support vertical scaling. This is the method of increasing the power of the system by adding additional CPU, memory and disk spaces. So to allow rapid incoming data, the single production server is optimsed to scale up. In this scaling technique there is always a single production server which can be connected by all the applications and users. A cluster environment can be created with some nodes and replicate the data across nodes. Because of ACID properties, all nodes should have the same set of data and data synchronization becomes complicated if there are serveral nodes in the clsuter. This is very optimised for Read scaling. Vertical scaling is also known as scale-up
The benefit of this scaling methodlogy is the tight integration of data and its consistency across the nodes in a cluster. All nodes will have the same set of data and If there is a problem with the production server, another node will automatically be connected by the applications. So this cluster is known as Fail-over cluster.
All the Non-relational database tools support horizontal scaling. This is the method of adding more computers to the network to allow rapid incoming data. It is easy to add more nodes into the cluster to allow data growth. Data are split automatically and processed across nodes in a cluster. This is a distributed data environment. Hadoop Distributed File System (HDFS) is a classical example for this. Horizontal scaling is also known as Scale-out.
The benefit of this scaling technique is that since data are split and replicated across nodes, if any of the nodes goes offline, the application can still have the data from other nodes and this gurantees the availabilty of data at all the time. This method is very useful for the cases where no JOINs are required among the data of the nodes. This is also helpful in seperating data and having them in different geographical locations.
While both these scaling techniques have advantages and disadvantages, a good environment can mix both of these to have outstanding Scale-up and Scal-out. We can have a scale-up read and write database in a single server which requires ACID properties and have a scale-out distributed historical data across several nodes for data mining purpose.