首页 » Big Data and AI » Data Analytics » 正文

Data Modeling Tutorial for Beginners

Data Modeling Tutorial for Beginners

The analysis of data objects and their inter-relations is known as data modeling. This process formulates data in a specific and well-configured structure. This well-presented data is further used for analysis and creating reports.

Data Modeling Tutorial for Beginners

What is Data Modeling?

The process of creating a model for the storage of data in a database is termed as data modeling. It is a theoretical presentation of data objects and associations among various data objects. Data modeling is a process of formulating data in an information system in a structured format. It helps in analyzing data easily which will further help in meeting business requirements.

The process of data modeling requires data modelers which will precisely work with stakeholders and prospective users of an information system. Data modeling process ends with the creation of a data model that supports the business information system infrastructure. This process also involves understanding the structure of an organization and proposing a solution that enables the organization to achieve its objectives. It bridges the gaps between technical and functional areas.

Why use a data model?

Significance of Data ModellingPrimary reasons for using a data model are listed below:

  • Visual representation of data helps improve data analysis. It provides a holistic picture of the data which can be used by developers to create a physical database.
  • All important data of an enterprise are accurately presented in the model. The data model reduces the chances of data omission. Data omission can lead to incorrect results and faulty reports.
  • The data model portrays a better understanding of business requirements.
  • It helps in the creation of a robust design that brings the entire data of an organization on the same platform. It assists in identifying the redundant, duplicate, and missing data as well.
  • A qualified data model helps in providing better consistency across all projects of an enterprise.
  • It improves data quality.
  • It helps Project Managers with a better scope and quality management. It also improves performance to the core.
  • It defines relational tables, stored procedures, and primary and foreign keys.

Three Perspectives of a Data Model

Conceptual Model

This level defines what needs to be present in the structure of the model in order to define and organize business concepts. It mainly focuses on business-oriented entries, attributes, and relations. It is basically designed by Data Architects and Business Stakeholders.

Logical Model

The logical model defines how the model should be implemented. It broadly includes all kinds of data that need to be captured such as tables, columns, etc. This model is generally designed by Business Analysts and Data Architects.

Physical Model

The physical model defines how to implement a data model with the help of the database management system. It outlines the implementation methodology in terms of tables, CRUD operations, indexes, partitioning, etc. It is created by Database Administrators and Developers.

Types of Data Models

There are various approaches to data modeling, but the base concept remains the same for all types of models. Let’s take a glance at some of the data models that are popularly used:

Hierarchical Model

This is a database modeling that is based on a tree-like structuring. Here, each of the records has a single root or parent. When it comes to sibling records, they are sorted in a particular order. This order is used as the physical order for storing the database. You can use this type of modeling for many real-world model relationships. During 1960s and 1970s, this database model was all the rage. But due to some inefficiencies, they are very rarely used now.

Relational Model

This was initially proposed as an alternative to the hierarchical model in 1970 by an IBM researcher. It doesn’t require developers to define the data path. Here, data segments are explicitly combined with the help of tables. This model has reduced the program complexity. It requires detailed knowledge of the physical data storage adopted by the organization. Soon after the introduction of this model, it was combined with Structured Query Language (SQL).

Network Model

The network model can be built on the hierarchical model, wherein it allows multiple relationships among linked records which implies that it has multiple parent records. As per the mathematical set theory, we construct the model with sets of related records. Each set consists of a parent record and multiple child records. Each record can belong to multiple sets and allows the model for conveying complex relationships.

Object-oriented Database Model

The object-oriented database model consists of a collection of objects. These objects have associated features and methods. We have various kinds of object-oriented databases, namely multimedia database, hypertext database, and more. This type of a database model is known as a post-relational database model as it is not limited to tables, even though it incorporates tables. We can call such types of database models as hybrid models.

Entity–Relationship Model

As the name indicates, the entity–relationship model is a graphical presentation of entities and their relationships. It is also known as the entity–relationship (E–R) diagram. An entity can be called as a concept, a piece of data, or an object about which the data (and the relations surrounding the data) is stored.

Object-relational Model

We can think of the object-relational model as a relational model having the advanced functionality of the object-oriented database model. What this type of a database model allows is to let designers incorporate the functions into a familiar table structure.

Facts and Dimensions

To learn data modeling, one really needs to understand its facts and dimensions.

Fact Table: It is a table containing measurements and granularity of every measurement. Facts can be additive or semi-additive, for example, sales.

Dimension Table: It is a table that collects fields containing descriptions of business elements and is referred by multiple fact tables.

Dimensional Modeling

Dimensional modeling is a design technique of data warehouse. It uses confirmed dimensions and facts and helps in easy navigation. Dimensional modeling design helps in fast performance query. Dimensional models are casually known as star schemas.

Keys Related to Dimensional Modeling

Keys are important to understand while we learn data modeling. Keys of dimensional modeling are divided into five categories.

  • Business or Natural Keys: It is a field that uniquely identifies an entity. For example, customer ID, employee number, etc.
  • Primary and Alternate Keys: Any field containing a unique record can be called as a primary key. The user needs to select one of all available primary keys, and the remaining become alternate keys.
  • Composite or Compound Keys: When more than one field is used to represent a key, it is referred to as a composite key.
  • Surrogate Keys: It is a field with no business meaning and is usually auto-generated.
  • Foreign Keys: It is a key that points to another key in some other table.

The process of data modeling involves designing and producing all types of data models. These data models are then converted through a data definition language. Data definition language is used to generate a database. This database will then be termed as a fully attributed data model.

Advantages and Disadvantages of Data Models

Advantages

  • Data objects provided by the functional team are presented accurately with data modeling.
  • Data modeling allows you to query data from the database and derive various reports based on the data. It indirectly contributes to data analysis with the help of reports. These reports can be used for improving the quality and productivity of the project.
  • Businesses have plenty of data in a variety of formats. Data modeling provides a structured system for such unstructured forms of data.
  • Data modeling improves business intelligence by making data modelers work closely with the ground realities of the project which include gathering data from multiple unstructured sources, reporting requirements, spending patterns, etc.
  • It improves communication across the organization.
  • It helps in documenting data mapping during the ETL process.

Disadvantages

  • Development of a data model is a very tedious job. One should be aware of the physical characteristics of the data storage.
  • This system involves complex application development and knowledge of biographical truth.
  • The model is not quite user-friendly. Small changes induced in the system require major modification in the entire application.

Data models are developed for the data to be stored in a database. The main objective of these data models is to ensure that data objects created by the functional team are denoted accurately. As said earlier, even the smallest change in the system will require changes in the entire model. However, despite some drawbacks, the data modeling concept is the first and major phase of database design, because it defines data entities, the relations among data objects, etc. A data model holistically talks about the business rules, government policies, and regulatory compliance on the data.