A Primer on Open-Source NoSQL Databases

The idea of this article is to understand Nosql databases,its properties,varIoUs types,data model,and how they differ from standard RDBMS.

1. Introduction

The RDMS databases are here for nearly three decades now. But in the era of social media,smart phones and cloud,we generate large volume of data,at a high velocity. Also the data varies from simple text messages to high resolution video files. The traditional RDBMS could not able to cope up with the velocity,volume and variety of data requirement of this new era. Also most of the RDBMS softwareare licensed and needs enterprise class,proprietary,licensed hardware machines. This has clearly let way for Open Source Nosql Databases,where the basic properties aredynamic schema,distributedandhorizontally scalableoncommodity hardware.

2. Properties of Nosql

Nosql is the acronym for Not Only sql. The basic qualitiesof Nosql databases are schema-less,distributed,and horizontally scalable on commodity hardware. The Nosql databases offers variety of functions to solve varIoUs problems with variety of data types,where "blob" used to be the only data type in RDBMS to store unstructured data.

2.1 Dynamic Schema

Nosql databases allows schema to be flexible. New columns can be added anytime. Rows may or may not have values for those columns and no strict enforcement of data types for columns. This flexibility is handy for developers,especially when they expect frequentchanges during the course of the product life cycle.

2.2 Variety of Data

Nosql databases support any type of data. Itsupports structured,semi-structured,and unstructured data to be stored. Its supports logs,images files,videos,graphs,jpegs,JSON,XML to be stored and operated as it is without any pre-processing. So it reduces the need for ETL (short for Extract — Transform — Load).

2.3 High Availability Cluster

Nosql databases support distributed storage using commodity hardware. It also supports high availability by horizontal scalability. This features enables Nosql databases get the benefit of elastic nature of the Cloud infrastructure services.

2.4 Open Source

Nosql databases are typically open source software. The software is free,and most of them are free to use in commercial products. The open source codebases can be modified to solve business needs. There are minor variations in the open source software licenses,users must be aware of license agreements.

2.5 Nosql — Not Only sql

Nosql databases not only depend on sql to retrieve data. They provide rich API interfaces to perform DML and CRUD operations. These APIs are more developer friendly,and supported in a variety of programming languages.

3. Types of Nosql

There are four types of Nosql data bases. They are: Key-Value databases,Column oriented database,Document oriented databases,andGraph databases. At a very high level most of these databases follow the similar structure of RDBMS databases. The database server might contain many databases. The databases might contain one or more tables inside it. The table intern will have rows and columns to store the actual data. This hierarchy is common across all Nosql databases,but the terminologies might vary.

3.1Key ValueDatabase

Key-Value databases are developed based on theDynamowhitepaper published byAmazon. Key-Value database allows the user to store data in simple<key> : <value>format,wherekeyis used to retrieve the value from the table.

3.1.1 Data Model

The table contains manykey spacesand each key space can have many identifiers to store key value pairs. The key-space issimilar to column in typical RDBMS and the group of identifiers presented under the key-space can be considered as rows.It is suitable for building simple,non-complex,high available applications. Since most of Key Value Databases support in memory storage,can be used for buildingcache mechanism.

3.1.3 Example:

  • DynamoDB

  • Redis

3.2 Column oriented Database

Column oriented databases aredeveloped based on theBig Tablewhitepaper published byGoogle. This takes a different approach than traditional RDBMS,where it supportsto add more and more columns and have wider table. Since the table is going to be very broad,it supports to group the column with a family name,call it "Column Family" or "Super Column". The Column Family can also be optional in some of the Column data bases. As per the common philosophy of Nosql databases,the values to the columns can be sparsely distributed.

3.2.1 Data Model

The table contains column families (optional). Each column family contains many columns. The values for columns might be sparsely distributed with key-value pairs.The Column oriented databases are alternate to the typical Data warehousing databases (Eg. Teradata) and they aresuitable for OLAP kind of application.

3.2.2 Example

  • Apache Cassandra

  • HBase

3.3Document-orientedDatabase

Document oriented databases offer support to store semi-structured data. It can be JSON,XML,YAML,or even a Word Document. The unit of data is called a document (similar to a row in RDBMS). The table which contains a group of documents is called as a "Collection".

3.3.1 Data Model

The Database contains many Collections. A Collection contains many documents. Each document might contain a JSON document or XML document or YAML or even a Word Document.Document databases are suitable for Web based applications and applications exposing RESTful services.

3.3.2 Example

  • MongoDB

  • Couchbase

3.4Graph Database

The real world graph contains vertices and edges. They are called nodes and relations in a graph. The graph databases allow us to store and perform data manipulation operations on nodes,relations and attributes of nodes and relations.The graph databases works better when the graphs are directed graphs,i.e. when there are relations between graphs.

3.4.1 Data Model

The graph database is the two dimensional representation of graph. The graph is similar to a table. Each graph contains Node,Node Properties,Relation and Relation Properties as Columns. There will be values for each row for these columns. The values for properties columns can have key-value pairs.Graph databases are suitable for social media,network problems which involves complex queries with more joins.

3.4.2 Example

  • Neo4j

  • OrientDB

  • HyperGraphDB

  • GraphBase

  • InfiniteGraph

4.Possible Problem Areas

Following are the important areas to be considered while choosing a Nosql database for given problem statement.

4.1 ACID Transactions:

Most of the Nosql databases donot support ACID transactions. E.g. MongoDB,CouchBase,Cassandra. [Note: To know more about ACID transaction capabilities,refer the appendix below].

4.2Proprietary APIs / sql Support

Some Nosql databases do not support Structured Query Language,they only support an API interface. There is no common standard for APIs. Every database follows its own way of implementing APIs,so there is a overhead of learning and developing separate adaptor layers for each and every databases. Some Nosql databases do not support all standard sql features.

4.3 No JOIN Operator

Due to the nature of the schema or data model,not all Nosql databases support JOIN operations by default,whereas in RDBMS JOIN operation is a core feature. The query language in Couchbase supports join operations. In HBase it can be achieved by integrating with Hive. MongoDB does not support it currently.

4.4 Lee-way of CAP Theorem

Most of the Nosql databases take the leeway suggested by CAP theorem and they support only any two properties of Consistency,Availability,and Partition. They do not support all three qualities. [Note: Please refer appendix to know more about CAP theorem].

5. Summary

Nosql databases solve the problems where RDBMS could not succeed in both functional and non-functional areas. In this article we have seen the basic properties,generic data models,varIoUs types and features of Nosql databases. To further proceed,start using anyone of Nosql database and get hands-on.

Appendix A Theories behind Databases

A.1ACID Transactions

ACID is an acronym for Atomicity,Consistency,Isolation,and Durability. These four properties are used to measure the following:

A.1.1Atomicity

Atomicity means that the database transactions must be atomic in nature. It is also calledallornothingrule. Databases must ensure that a single failure must result rollback of the entire transaction until the commit point. Only if all transactions are successful the transaction must be committed.

A.1.2Consistency

Databases must ensure that only valid data must be allowed to be stored. In RDBMS,it is all about enforcing schema. In Nosql the consistency varies depends on the type of DB. For example,in GraphDB such as Neo4J,consistency ensures that relationship must have start and end node. In MongoDB,it automatically creates a unique rowid,using a 24bit length value.

A.1.3Isolation

Databases allow multiple transactions in parallel. For example,when read and write operations happens in parallel,read will not know about the write operation until write transaction is committed. The read operation will have only legacy data,until the full commit of the write transaction is completed.

A.1.4Durability

Databases must ensure that committed transactions are persisted into storage. There must be appropriate transaction and commit logs available to enforce writing into hard disk.

A.2Brewer's CAP-Theorem

CAPtheorem recommends properties for any shared-data systems. They are: Consistency,Availability & Partition. It also recommends that to be qualified as a shared-data system,a database must support at most two of these properties.

A.2.1 Consistency

In a distributed database system,all the nodes must see the same data at the same time.

A.2.2 Availability

The database system must be available to service a request received. Basically,the DBMS must be a high available system.

A.2.3 Partition Tolerance

The database system must continue to operate despite arbitrary partitioning due to network failures.


From:https://dzone.com/articles/a-primer-on-open-source-nosql-databases

相关文章

一、引言 学习redis 也有一段时间了,该接触的也差不多了。后来有一天,以前的同事问我,如何向redis中...
一、引言 上一篇文章,我介绍了如何在Linux系统上安装和配置MongoDB,其实都不是很难,不需要安装和编译...
一、介绍 Redis客户端使用RESP(Redis的序列化协议)协议与Redis的服务器端进行通信。 虽然该协议是专门...
一、引言 redis学了一段时间了,基本的东西都没问题了。从今天开始讲写一些redis和lua脚本的相关的东西...
一、介绍 今天继续redis-cli使用的介绍,上一篇文章写了一部分,写到第9个小节,今天就来完成第二部分。...
一、引言 上一篇文章我们已经介绍了MongoDB数据库的查询操作,但是并没有介绍全,随着自己的学习的深入...