What are Data Store Models?
In simple terms, a data store model is a way data is
structured, saved, and accessed in a database. Think of it like different types
of shelves you'd use to organize items at home: some shelves are good for
books, others for clothes, and some for tools. The type of shelf you choose
depends on what you're storing. Similarly, the type of data store model you
choose depends on the kind of data you're handling. Here are some common data
store models:
Modern business systems manage increasingly large volumes of
heterogeneous data. This heterogeneity means that a single data store is
usually not the best approach. Instead, it's often better to store different
types of data in different data stores, each focused toward a specific workload
or usage pattern. The term polyglot persistence is used
to describe solutions that use a mix of data store technologies.
The term "polyglot persistence" is like speaking
multiple languages (polyglot means "many tongues" in Greek). It means
that, instead of using one type of database for everything, you use different
databases for different tasks, depending on what's best suited for each task.
Lets take an analogy to understand it better :)
Imagine you're building a house. You wouldn't use just a hammer for every job, right? Sometimes you'd use a saw, sometimes a drill, and other times a wrench. Each tool is best for a particular job. Similarly, in the digital world, you might use a relational database to store customer information, a graph database to understand their preferences and relationships, and a key-value store for fast session storage.
In essence, heterogeneous data or polyglot persistence is
about using the right data storage tool (or database) for the right job,
sometimes even combining multiple databases in a single solution to harness the
strengths of each. This approach helps in optimizing performance, scalability,
and functionality based on specific use cases and needs.
Lets take an example to cement what we understood as of now:
Imagine a large online shopping platform Amazon. Such
platforms handle a multitude of tasks, from managing user profiles and product
listings to tracking orders and analyzing user behavior and there approach to
store data for the efficiency & management is heterogeneous stores. Below are the DB of choice for various aspects of an e-commerce.
- User Profiles: A document store like MongoDB is great because each user's data can be stored as a flexible "document" that can evolve over time without rigid schema constraints.
- Products can be organized in a structured way: title, price, description, SKU, etc. This fits well into a relational database, where relationships, like which products fall under which categories, are essential.
- When users add products to their shopping carts, quick access and high availability are crucial. A key-value store like Redis provides lightning-fast access where a user's ID (key) can quickly retrieve their cart items (value).
- To suggest products to users based on their behavior and the behavior of similar users, a graph database is invaluable.
- To analyze trends in user behavior, like which products are most viewed or what search terms are trending, a columnar store like Cassandra can efficiently query and analyze large datasets by columns.
- When users leave reviews or comments on products, a system needs to provide quick searches, filter, and analyze large amounts of text. Elasticsearch, a search engine backed by a document store structure, excels in this, providing near real-time search results.
Benefits of Polyglot Persistence in this Scenario:
- Flexibility: Each part of the e-commerce system can use a database best suited to its needs.
- Scalability: As the platform grows, each database can be scaled independently based on its workload.
- Performance: Each task, from cart management to product recommendations, can be optimized for maximum performance by leveraging the strengths of different databases.
Selecting the right data store for your requirements is a
key design decision. There are literally hundreds of implementations to choose
from among SQL and NoSQL databases. Data stores are often categorized by how
they structure data and the types of operations they support. Lets explore most used databases one by one:
1) Relational database management systems
Relational databases organize data as a series of two-dimensional tables
with rows and columns.
Think of them as organized spreadsheets where you store data in rows and
columns, making it easy to relate data from different sheets.
An RDBMS typically implements a transitionally consistent mechanism
that conforms to the ACID (Atomic, Consistent, Isolated, Durable) model for
updating information.
An RDBMS typically supports a schema-on-write model, where the data
structure is defined ahead of time, and all read or write operations must use
the schema.
This model is very useful when strong consistency guarantees are
important — where all changes are atomic, and transactions always leave the
data in a consistent state. However, an RDBMS generally can't scale out
horizontally without sharding the data in some way. Also, the data in an RDBMS
must be normalized, which isn't appropriate for every data set.
Azure services
·
Azure SQL Database | (Security Baseline)
·
Azure Database for MySQL | (Security Baseline)
·
Azure Database for
PostgreSQL | (Security Baseline)
·
Azure Database for MariaDB | (Security Baseline)
Workload
·
Records
are frequently created and updated.
·
Multiple
operations have to be completed in a single transaction.
·
Relationships
are enforced using database constraints.
·
Indexes
are used to optimize query performance
Data Type
·
Database
schemas are required and enforced.
·
Many-to-many
relationships between data entities in the database.
·
Constraints
are defined in the schema and imposed on any data in the database.
·
Data
requires high integrity. Indexes and relationships need to be maintained
accurately.
·
Data
requires strong consistency. Transactions operate in a way that ensures all
data are 100% consistent for all users and processes.
·
Inventory
management
·
Order
management
·
Reporting
database
·
Accounting
· Typically, a document
contains the data for single entity, such as a customer or an order.
- Azure Service: Azure Cosmos DB (with document data model)
- Workload Type: Flexible and schema-less data operations
- Data Type: JSON-like documents,
·
Size of individual document data is relatively
small.
·
Each document type can use its own schema.
· Document data is semi-structured, meaning that data types of each field are not strictly defined
- Example Use Case: Content management systems (as each content piece
might have different attributes), Product catalogue
3) Key/value stores
A
key/value store associates each data value with a unique key. Most key/value
stores only support simple query, insert, and delete operations.
Picture a giant locker where each compartment has a unique
label, making it quick to find what's inside by its label.
Key/value stores are highly optimized for applications
performing simple lookups, but are less suitable if you need to query data
across different key/value stores. Key/value stores are also not optimized for
querying by value.
A
single key/value store can be extremely scalable, as the data store can easily
distribute data across multiple nodes on separate machines.
Azure services
·
Azure Cosmos DB for Table and Azure Cosmos DB for NoSQL | (Azure Cosmos DB Security
Baseline)
·
Azure Cache for Redis | (Security Baseline)
·
Azure Table Storage | (Security Baseline)
Workload - Fast access,
simple lookups
·
Data is accessed using a single key, like a dictionary.
·
No joins, lock, or unions are required.
·
No aggregation mechanisms are used.
·
Secondary indexes are generally not used.
Data type
·
Each key is associated with a single value.
·
There is no schema enforcement.
·
No relationships between entities.
Examples
·
Session management
· User preference and profile management
· Product recommendation and ad serving.
4) Graph Databases:
Simple Explanation: Think of a web or network showing how
different points are connected, like a family tree.
A graph database stores two types of information, nodes and
edges. Edges specify relationships between nodes. Nodes and edges can have
properties that provide information about that node or edge.
Graph databases can efficiently perform queries across the network of nodes and edges and analyze the relationships between entities.
The following diagram shows an organization's personnel
database structured as a graph. The entities are employees and departments, and
the edges indicate reporting relationships and the departments in which
employees work.
This structure makes it straightforward to perform queries
such as "Find all employees who report directly or indirectly to
Sarah" or "Who works in the same department as John.
Azure Service: Azure Cosmos DB (with graph data model)
Workload Type: Relationship-heavy queries
Data Type: Entities or nodes and their relationships
Example Use Case:
Social networks (to find connections, friends of friends, etc.) , Organizational
Charts, recommendation engines.
5) Column-family databases
A column-family database organizes data into rows and columns. In its simplest form, a column-family database can appear very similar to a relational database, at least conceptually. The real power of a column-family database lies in its denormalized approach to structuring sparse data.
Envision a wardrobe where you don't store complete outfits together but instead put all shirts in one drawer, all pants in another, etc.
You can think of a column-family database as holding tabular data with rows and columns, but the columns are divided into groups known as column families. Each column family holds a set of columns that are logically related together and are typically retrieved or manipulated as a unit.
The following diagram shows an example
with two column families, Identity
and Contact Info
. The data for a single entity has the same row
key in each column-family. This structure, where the rows for any given object
in a column family can vary dynamically, is an important benefit of the
column-family approach, making this form of data store highly suited for
storing structured, volatile data.
- Azure Service: Azure Cosmos DB (with column-family data model) or
Azure Data Explorer
- Workload Type: Big data analytics, fast reads/writes
- Most column-family databases perform write operations extremely
quickly.
- Update and delete operations are rare.
- Designed to provide high throughput and low-latency access.
- Supports easy query access to a particular set of fields within a
much larger record.
- Massively scalable.
- Data Type: Columns of data
- Data is stored in tables consisting of a key column and one or
more column families.
- Specific columns can vary by individual rows.
- Individual cells are accessed via get and put commands
- Example Use Case: Event logging systems (because you often query on
specific attributes of the event rather than full event details)
- Recommendations
- Personalization
- Sensor data
- Telemetry
- Messaging
Comparison or conclusion Note:
RDBMS, like Azure SQL Database, is great for scenarios where
data integrity and relations are crucial, such as inventory management. You'd
want to know how many of a particular product you have, which supplier it came
from, and who purchased it, all tied together.
On the other hand, NoSQL models, offered under the umbrella
of Azure Cosmos DB and other Azure services, provide flexibility and
scalability for varied data structures, which are especially handy when dealing
with vast amounts of diverse data or rapidly changing schemas.
6) Search Engine Databases:
Simple Explanation:
A search engine database is designed to make searching very
fast, even in huge amounts of data.
Imagine a librarian who not only knows where every book is
but can also quickly tell you which books mention a specific topic, phrase, or
word, even if it's just in a single page.
A search engine database allows applications to search for information held in external data stores. A search engine database can index massive volumes of data and provide near real-time access to these indexes.
Azure Service: Azure Cognitive Search
Workload Type: Text search, full-text indexing, and complex search operations.
Data Type:
·
Semi-structured or unstructured
text
· Text with reference to structured data
Example Use Case: E-commerce product search (where users might search for products using various terms, and the system needs to quickly display relevant results).
Search engine databases like Azure Cognitive Search or Elasticsearch aren't typical databases in the RDBMS or NoSQL sense. Instead, they're optimized for searching.
They do this by creating an "index" of the data, much like the index at the back of a book. This index allows them to find relevant results very quickly without having to read through each "page" (or piece of data) every time.
They are often used in conjunction with other databases, where the primary database handles the regular data storage and transactions, while the search engine database handles search queries.
Please share your learning to make this read more impactful and comments of course for feedback.
No comments:
Post a Comment