Non-Relational Database – Michael Durkan

Its Day 64 of my 100 Days of Cloud journey, and today I’m looking at Azure Cosmos DB.

In the last post, we looked at Azure SQL and the different options we have available for hosting SQL Databases in Azure. SQL is an example of a Relational Database Management System (RDBMS), which follows a traditional model of storing data using 2-dimensional tables where data is stored in columns and rows in a pre-defined schema.

The opposite to this is non-relational databases, which use a storage model that is optimized for the specific requirements of the type of data being stored. Non-relational databases can have the following structures:

Document Data Stores, which stores data in JSON, XML, YAML or plain text format.
Columnar Data Stores, which stores data in column families which are logically related and manipulated as a unit.
Key/value Data Stores, which holds a data value that has a corresponding key.
Graph Databases, which are made up of nodes and edges to host data such as Organization Charts and Fraud detection.

All of the above options can be achieved by using Azure Cosmos DB.

Overview

Lets start with an overview – Azure Cosmos DB is a fully managed NoSQL database provides high availability, globally-distributed access to data with very low latency

If we log on to the Azure Portal and go to create an Azure Cosmos DB, we are given the options below:

The different API’s available are:

Core (SQL) API: Provides the flexibility of a NoSQL document store combined with the power of SQL for querying.
MongoDB API: Supports the MongoDB wire protocol so that existing MongoDB client continue to work with Azure Cosmos DB as if they are running against an actual MongoDB database.
Cassandra API: Supports the Cassandra wire protocol so that existing Apache drivers compliant with CQLv4 continue to work with Azure Cosmos DB as if they are running against an actual Cassandra database.
Gremlin API: Supports graph data with Apache TinkerPop (a graph computing framework) and the Gremlin query language.
Table API: Provides premium capabilities for applications written for Azure Table storage.

The key to picking an API is to select the one that best meets the needs for your database, but be warned: if you pick an API you cannot change it afterwards. Each API has its own set of database operations. These operations range from simple point reads and writes to complex queries. Each database operation consumes system resources based on the complexity of the operation.

Once your API is selected, you get into the usual screens for creating resources in Azure:

Pricing

Now this is where we need to talk about pricing – in SQL, we are familiar with licensing using Cores. This works the same way in Azure with the concept of vCores, but we also have the concept of Database Transaction Units (DTU’s) which is based on a bundled measure of compute, storage, and I/O resources.

In Azure Cosmos DB, usage is priced based on Request Units (RUs). You can think of RUs per second as the currency for throughput. As shown in the screenshot above, there are 2 pricing models available:

Provisioned throughput mode: In this mode, you provision the number of RUs for your application on a per-second basis in increments of 100 RUs per second. You are billed on an hourly basis for the number of RUs per second you have provisioned.
Serverless mode: In this mode, you don’t have to provision any throughput when creating resources in your Azure Cosmos account. At the end of your billing period, you get billed for the number of Request Units that has been consumed by your database operations.

We also have a 3rd option:

Autoscale mode: In this mode, you can automatically and instantly scale the throughput (RU/s) of your database or container based on its usage, without impacting the availability, latency, throughput, or performance of the workload.

Each request to Azure Cosmos DB returns used RUs to you so you can decide whether stop your requests or increase the RU limit on the Azure portal.

Consistency Levels

The other important thing to note about Cosmos DB is Consistency Levels. Because Cosmos DB is a globally distributed database, you can set the level of consistency for replication across your global data centers. There are 5 levels to choose from:

Strong consistency is the strictest type of consistency available in CosmosDB. The data is synchronously replicated to all the replicas in real-time. This mode of consistency is useful for applications that cannot tolerate any data loss in case of downtime.
In the Bounded Staleness level, data is replicated asynchronously with a predetermined staleness window defined either by numbers of writes or a period of time. The reads query may lag behind by either a certain number of writes or by a pre-defined time period. However, the reads are guaranteed to honor the sequence of the data.
Session consistency is the default consistency that you get while configuring the cosmos DB account. This level of consistency honors the client session. It ensures a strong consistency for an application session with the same session token.
Consistent prefix model is similar to bounded staleness except, the operational or time lag guarantee. The replicas guarantee the consistency and order of the writes however the data is not always current. This model ensures that the user never sees an out-of-order write.
Eventual consistency is the weakest consistency level of all. The first thing to consider in this model is that there is no guarantee on the order of the data and also no guarantee of how long the data can take to replicate. As the name suggests, the reads are consistent, but eventually.

Use Cases

Any web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a global scale with near-real response times for a variety of data will benefit from Cosmos DB’s guaranteed high availability, high throughput, low latency, and tunable consistency. The Microsoft Docs article here describes the common use cases for Azure Cosmos DB.

Conclusion

And thats a look at the different options avaiable in Azure Cosmos DB. Hope you enjoyed this post, until next time!

Its Day 62 of my 100 Days of Cloud journey, and today I’m starting to look at the different Database Solutions available in Azure. “The Dude” to me to …..

The next 2 posts are going to cover the 2 main offerings – Azure SQL and Azure Cosmos. But first, we need to understand the different types of database that are available to us, how they store their data and the use cases where we would utilize the different database types.

Relational Databases

Lets kick off with Relational Database Management Systems, or RDBMS. These are the traditional model of storing data, and organises the data into 2-dimensional tables which have a series of rows and columns into which the data is stored.

RDBMS Databases follow a schema based model, where the data structure of the schema needs to be defined before any data is written. Any subsequent read or write operations must use the defined schema.

Vendors who use this model provide a version of Structured Query Language (SQL) for retrieving and managing the data. The most common examples of these would be Microsoft SQL, Oracle SQL or PostgreSQL.

RDBMS is useful when data consistency is required, however the downside is that RDBMS cannot easily scale out horizontally.

In Azure, the following RDBMS services are available:

Azure SQL Database – this is the full hosted version of SQL Server.
Azure Database for MySQL – open source relational database management system. MySQL uses standard SQL commands such as INSERT, DROP, ADD, and UPDATE, etc. The main purpose of MySQL is for e-commerce, data warehouse, and logging applications. Many database-driven websites use MySQL
Azure Database for PostgreSQL – this is a highly scalable RDBMS system which is cross-platform and can run on Linux, Windows and MacOS. PostgreSQL can perform complex queries, foreign keys, triggers, updatable views, and transactional integrity.
Azure Database for MariaDB – High performance OpenSource relational database based on MySQL. Dynamic columns allow a single DBMS to provide both SQL and NoSQL data handling for different needs. Supports encrypted tables, LDAP authentication and Kerberos.

The main use cases for RDBMS are:

Inventory management
Order management
Reporting database
Accounting

Non-Relational Databases

The opposite of relational databases are non-relational database, which is a database that does not use the tabular schema of rows and columns found in most traditional database systems. Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored. For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.

Because of the varying ways that data can be stored, there are LOADS of different types of non-relational databases.

Lets take a look at the different types of non-relational or NoSQL database.

Document Data Stores

A document data store manages a set of named string fields and object data values in an entity that’s referred to as a document. These are typically stored in JSON format, but can also be stored as XML, YAML, JSON, BSON, or even plain text. The fields within these documents are exposed to the storage management system, enabling an application to query and filter data by using the values in these fields. Typically, a document contains the entire data for an entity, and all documents are not required to have the same structure.

The application can retrieve documents by using the document key, which is hashed and is a unique identifier for the document.

From a service perspective, this would be delivered in Azure Cosmos DB.

Examples of use cases would be Product catalogs, Content management or Inventory management.

Columnar data stores

A columnar or column-family data store organizes data into columns and rows, which is very similar to a relational database. However, while a column-family database stores the data in tabular data with rows and columns, the columns are divided into groups known as column families. Each column family holds a set of columns that are logically related and are typically retrieved or manipulated as a unit. New columns can be added dynamically, and rows can be empty.

From a service perspective, this would be delivered in Azure Cosmos DB Cassandra API, which is used to store apps written for Apache Cassandra.

Examples of use cases would be Sensor data, Messaging, Social media and Web analytics, Activity monitoring, or Weather and other time-series data.

Key/value Data Stores

A key/value store associates each data value with a unique key. Most key/value stores only support simple query, insert, and delete operations. To modify a value (either partially or completely), an application must overwrite the existing data for the entire value. Key/value stores are highly optimized for applications performing simple lookups, but are less suitable if you need to query data across different key/value stores. Key/value stores are also not optimized for querying by value.

From a service perspective, this would be delivered in Azure Cosmos DB Table API or SQL API, Azure Cache for Redis, or Azure Table Storage.

Examples of use cases would be Data caching, Session management, or Product recommendations and ad serving.

Graph Databases

A graph database stores two types of information, nodes and edges. Edges specify relationships between nodes. Nodes and edges can have properties that provide information about that node or edge, similar to columns in a table. Edges can also have a direction indicating the nature of the relationship.

Graph databases can efficiently perform queries across the network of nodes and edges and analyze the relationships between entities.

From a service perspective, this would be delivered in Azure Cosmos DB Gremlin API.

Examples of use cases would be Organization charts, Social graphs, and Fraud detection.

Conclusion

And thats a whistle stop tour of the different types of databases available in Azure. There are other options such as Data Lake and Time Series, but I’ll leave those for future posts as they are bigger topics that deserve more attention.

Hope you enjoyed this post, until next time – I feel like going bowling now!

Category: Non-Relational Database

100 Days of Cloud – Day 64: Azure Cosmos DB

100 Days of Cloud – Day 62: Azure Database Solutions