Apache Cassandra – Making a community believe in NoSQL
The term Cassandra, is derived from Greek mythology for a person who is cursed to tell truth which nobody would believe. When introduced in 2008, Cassandra had a similar image among people as it talked about scalable database. The journey has been a continuous bumpy ride from being something that is immensely laughable to the recent times where the list of top 10 DB Engines is incomplete without its presence. Let us take a look at the events which unfolded during the last 10 years.
Facebook engineers Avinash Lakshman and Prashant Mallik came up with an initial idea for Cassandra while solving inbox search problem. They were looking for a database level solution to fetch reverse indexes of all messages sent or received from any user of Facebook across the world. The exact source of innovation was to achieve a strict SLA which meets facebook’s standard across the application. The traditional RDBMS database provided a solution with huge numbers much below the expectations from an application like Facebook. This led to a new design of database which could be scaled dynamically as per requirement. Even after its invention, it received a lot of criticism for not being able to find a user base. Many technical founders and CEO’s assumed that it could never meet a consumer environment because of no potential requirement in the previous background. But very few had any idea about the wonders of cloud technology. The database formulation in 2008 using Cassandra was capable of operating on a clustered environment which was hardly used in that era even by the most competent companies. It appeared a far sighted approach to use clustered environment in 2008. But with the advent of cloud technology many companies started switching to the cloud environment for storage as well as computation. It was mainly due to the need to get rid of a private server at organization level. Cost and time effectiveness were the key for adaption.
Traditional RDBMS failed to serve the needs and this is where NoSql tool the limelight. RDBMS could not cater the huge amount of data which was getting into the system every day without compromising cost. The solution needed to scale incrementally and in a cost effective fashion. Cassandra offers the same with no single point of failure and high availability. It offers robust support for clusters spanning multiple data centers with low latency for multiple clients. In 2011, Cassandra query language was introduced which tried to remove dependency from traditional structured query language. The tables, views, etc. still needs be created in the old ways but remaining tasks of querying the database is now replaced with Cassandra which provides much easier and feasible approach. In various release from 2011 it introduced features like support for integrated caching, Apache Hadoop MapReduce, secondary indexes, zero-downtime upgrades, clustering across virtual nodes, inter-node communication, atomic batches, request tracing and de-normalisation. With denormalisation, the focus has shifted from complex queries for data. Earlier, a lot of efforts were put to find the perfect queries to give the exact results. This used to hamper performance of the entire software. Now with NoSql, the focus is more on architecture instead of the complexities of coding in different languages. All of this has been possible because of farsightedness of people who thought about it at time when distributed networks were considered far from reach of normal small and mi-level companies.
What lies ahead:
From companies like Uber and Blackrock to huge giants like Apple and Cisco, no one could do away without adapting to NoSQL. This free and distributed database management solution has proved its importance especially in the last 5 years. But it faces even a more challenging road ahead because of constant innovation in the outer world. From Datastax to ASF, Cassandra has witnessed a lot of tough phases especially in terms of market share. Now the new challenge is from ScyllaDB which claims to be even more power and fast in distributed environment. The innovation that drove its very existence can only help to keep it alive in the market in coming future.
About the Author
DataFactZ is a professional services company that provides consulting and implementation expertise to solve the complex data issues facing many organizations in the modern business environment. As a highly specialized system and data integration company, we are uniquely focused on solving complex data issues in the data warehousing and business intelligence markets.