Pachehra: Data Store Models

What are Data Store Models?

In simple terms, a data store model is a way data is structured, saved, and accessed in a database. Think of it like different types of shelves you'd use to organize items at home: some shelves are good for books, others for clothes, and some for tools. The type of shelf you choose depends on what you're storing. Similarly, the type of data store model you choose depends on the kind of data you're handling. Here are some common data store models:

Modern business systems manage increasingly large volumes of heterogeneous data. This heterogeneity means that a single data store is usually not the best approach. Instead, it's often better to store different types of data in different data stores, each focused toward a specific workload or usage pattern. The term polyglot persistence is used to describe solutions that use a mix of data store technologies.

The term "polyglot persistence" is like speaking multiple languages (polyglot means "many tongues" in Greek). It means that, instead of using one type of database for everything, you use different databases for different tasks, depending on what's best suited for each task.

Lets take an analogy to understand it better :)

Imagine you're building a house. You wouldn't use just a hammer for every job, right? Sometimes you'd use a saw, sometimes a drill, and other times a wrench. Each tool is best for a particular job. Similarly, in the digital world, you might use a relational database to store customer information, a graph database to understand their preferences and relationships, and a key-value store for fast session storage.

In essence, heterogeneous data or polyglot persistence is about using the right data storage tool (or database) for the right job, sometimes even combining multiple databases in a single solution to harness the strengths of each. This approach helps in optimizing performance, scalability, and functionality based on specific use cases and needs.

Lets take an example to cement what we understood as of now:

Imagine a large online shopping platform Amazon. Such platforms handle a multitude of tasks, from managing user profiles and product listings to tracking orders and analyzing user behavior and there approach to store data for the efficiency & management is heterogeneous stores. Below are the DB of choice for various aspects of an e-commerce.

User Profiles: A document store like MongoDB is great because each user's data can be stored as a flexible "document" that can evolve over time without rigid schema constraints.
Products can be organized in a structured way: title, price, description, SKU, etc. This fits well into a relational database, where relationships, like which products fall under which categories, are essential.
When users add products to their shopping carts, quick access and high availability are crucial. A key-value store like Redis provides lightning-fast access where a user's ID (key) can quickly retrieve their cart items (value).
To suggest products to users based on their behavior and the behavior of similar users, a graph database is invaluable.
To analyze trends in user behavior, like which products are most viewed or what search terms are trending, a columnar store like Cassandra can efficiently query and analyze large datasets by columns.
When users leave reviews or comments on products, a system needs to provide quick searches, filter, and analyze large amounts of text. Elasticsearch, a search engine backed by a document store structure, excels in this, providing near real-time search results.

Benefits of Polyglot Persistence in this Scenario:

Flexibility: Each part of the e-commerce system can use a database best suited to its needs.
Scalability: As the platform grows, each database can be scaled independently based on its workload.
Performance: Each task, from cart management to product recommendations, can be optimized for maximum performance by leveraging the strengths of different databases.

Selecting the right data store for your requirements is a key design decision. There are literally hundreds of implementations to choose from among SQL and NoSQL databases. Data stores are often categorized by how they structure data and the types of operations they support. Lets explore most used databases one by one:

1) Relational database management systems

Relational databases organize data as a series of two-dimensional tables with rows and columns.

Think of them as organized spreadsheets where you store data in rows and columns, making it easy to relate data from different sheets.

An RDBMS typically implements a transitionally consistent mechanism that conforms to the ACID (Atomic, Consistent, Isolated, Durable) model for updating information.

An RDBMS typically supports a schema-on-write model, where the data structure is defined ahead of time, and all read or write operations must use the schema.

This model is very useful when strong consistency guarantees are important — where all changes are atomic, and transactions always leave the data in a consistent state. However, an RDBMS generally can't scale out horizontally without sharding the data in some way. Also, the data in an RDBMS must be normalized, which isn't appropriate for every data set.

Azure services

· Azure SQL Database | (Security Baseline)

· Azure Database for MySQL | (Security Baseline)

· Azure Database for PostgreSQL | (Security Baseline)

· Azure Database for MariaDB | (Security Baseline)

Workload

· Records are frequently created and updated.

· Multiple operations have to be completed in a single transaction.

· Relationships are enforced using database constraints.

· Indexes are used to optimize query performance

Data Type

· Database schemas are required and enforced.

· Many-to-many relationships between data entities in the database.

· Constraints are defined in the schema and imposed on any data in the database.

· Data requires high integrity. Indexes and relationships need to be maintained accurately.

· Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.

Examples

· Inventory management

· Order management

· Reporting database

· Accounting

2) Document Databases

Imagine a digital filing cabinet where each file can have different types of information without a fixed format.

A document database stores a collection of documents, where each document consists of named fields and data. The data can be simple values or complex elements such as lists and child collections. Documents are retrieved by unique keys.

· Typically, a document contains the data for single entity, such as a customer or an order.

Azure Service: Azure Cosmos DB (with document data model)
Workload Type: Flexible and schema-less data operations

Data Type: JSON-like documents,

· Size of individual document data is relatively small.

· Each document type can use its own schema.

· Document data is semi-structured, meaning that data types of each field are not strictly defined

Example Use Case: Content management systems (as each content piece might have different attributes), Product catalogue

3) Key/value stores

A key/value store associates each data value with a unique key. Most key/value stores only support simple query, insert, and delete operations.

Picture a giant locker where each compartment has a unique label, making it quick to find what's inside by its label.

Key/value stores are highly optimized for applications performing simple lookups, but are less suitable if you need to query data across different key/value stores. Key/value stores are also not optimized for querying by value.

A single key/value store can be extremely scalable, as the data store can easily distribute data across multiple nodes on separate machines.

Azure services

· Azure Cosmos DB for Table and Azure Cosmos DB for NoSQL | (Azure Cosmos DB Security Baseline)

· Azure Cache for Redis | (Security Baseline)

· Azure Table Storage | (Security Baseline)

Workload - Fast access, simple lookups

· Data is accessed using a single key, like a dictionary.

· No joins, lock, or unions are required.

· No aggregation mechanisms are used.

· Secondary indexes are generally not used.

Data type

· Each key is associated with a single value.

· There is no schema enforcement.

· No relationships between entities.

Examples

· Data caching

· Session management

· User preference and profile management

· Product recommendation and ad serving.

4) Graph Databases:

Simple Explanation: Think of a web or network showing how different points are connected, like a family tree.

A graph database stores two types of information, nodes and edges. Edges specify relationships between nodes. Nodes and edges can have properties that provide information about that node or edge.

Graph databases can efficiently perform queries across the network of nodes and edges and analyze the relationships between entities.

The following diagram shows an organization's personnel database structured as a graph. The entities are employees and departments, and the edges indicate reporting relationships and the departments in which employees work.

This structure makes it straightforward to perform queries such as "Find all employees who report directly or indirectly to Sarah" or "Who works in the same department as John.

Azure Service: Azure Cosmos DB (with graph data model)

Workload Type: Relationship-heavy queries

Data Type: Entities or nodes and their relationships

Example Use Case: Social networks (to find connections, friends of friends, etc.) , Organizational Charts, recommendation engines.

5) Column-family databases

A column-family database organizes data into rows and columns. In its simplest form, a column-family database can appear very similar to a relational database, at least conceptually. The real power of a column-family database lies in its denormalized approach to structuring sparse data.

Envision a wardrobe where you don't store complete outfits together but instead put all shirts in one drawer, all pants in another, etc.

You can think of a column-family database as holding tabular data with rows and columns, but the columns are divided into groups known as column families. Each column family holds a set of columns that are logically related together and are typically retrieved or manipulated as a unit.

The following diagram shows an example with two column families, Identity and Contact Info. The data for a single entity has the same row key in each column-family. This structure, where the rows for any given object in a column family can vary dynamically, is an important benefit of the column-family approach, making this form of data store highly suited for storing structured, volatile data.

Azure Service: Azure Cosmos DB (with column-family data model) or Azure Data Explorer
Workload Type: Big data analytics, fast reads/writes

Most column-family databases perform write operations extremely quickly.
Update and delete operations are rare.
Designed to provide high throughput and low-latency access.
Supports easy query access to a particular set of fields within a much larger record.
Massively scalable.

Data Type: Columns of data

Data is stored in tables consisting of a key column and one or more column families.
Specific columns can vary by individual rows.
Individual cells are accessed via get and put commands

Example Use Case: Event logging systems (because you often query on specific attributes of the event rather than full event details)

Recommendations
Personalization
Sensor data
Telemetry
Messaging

Comparison or conclusion Note:

RDBMS, like Azure SQL Database, is great for scenarios where data integrity and relations are crucial, such as inventory management. You'd want to know how many of a particular product you have, which supplier it came from, and who purchased it, all tied together.

On the other hand, NoSQL models, offered under the umbrella of Azure Cosmos DB and other Azure services, provide flexibility and scalability for varied data structures, which are especially handy when dealing with vast amounts of diverse data or rapidly changing schemas.

6) Search Engine Databases:

Simple Explanation:

A search engine database is designed to make searching very fast, even in huge amounts of data.

Imagine a librarian who not only knows where every book is but can also quickly tell you which books mention a specific topic, phrase, or word, even if it's just in a single page.

A search engine database allows applications to search for information held in external data stores. A search engine database can index massive volumes of data and provide near real-time access to these indexes.

Azure Service: Azure Cognitive Search

Workload Type: Text search, full-text indexing, and complex search operations.

Data Type:

· Semi-structured or unstructured text

· Text with reference to structured data

Example Use Case: E-commerce product search (where users might search for products using various terms, and the system needs to quickly display relevant results).

Search engine databases like Azure Cognitive Search or Elasticsearch aren't typical databases in the RDBMS or NoSQL sense. Instead, they're optimized for searching.

They do this by creating an "index" of the data, much like the index at the back of a book. This index allows them to find relevant results very quickly without having to read through each "page" (or piece of data) every time.

They are often used in conjunction with other databases, where the primary database handles the regular data storage and transactions, while the search engine database handles search queries.

Please share your learning to make this read more impactful and comments of course for feedback.

Pachehra

Data Store Models - An Overview

1) Relational database management systems

Azure services

Workload

3) Key/value stores

Azure services

Workload - Fast access, simple lookups

Data type

Examples

No comments:

Post a Comment

Different industries approach digital transformation on Azure

Labels

Report Abuse