Data Science Notes

4.2 Data Types in Data Science

In data science we can mainly classify data into two main types: qualitative(categorical) and quantitative(numeric).

Qualitative or Categorical data

Describes an object or a matter of quality that can be labeled or named. It cannot be represented in numerical form. Examples include colors, places, etc.

Types of Qualitative data:

Ordinal data
Nominal data

Quantitative or Numerical data

Numerical data that can be measured mathematically. Examples include height, weight, number of students in a school, etc.

Types of Quantitative data:

Discrete data
Continuous data

4.2.1 Dataset and Database

A dataset is a structured or organized collection of data, usually associated with a unique body of work. A database is an organized collection of data stored in multiple datasets or tables.

Relational Databases

Structured data in tables with rows and columns

Non-Relational Databases

Various data models like document, key-value, wide-column, and graph

4.2.3 Role of database in data science

Databases are crucial for managing and storing large amounts of data. They were introduced to manage and store large amounts of data effectively.

Key reasons for database popularity:

Better organized data structure
The dependence of data science on data

4.2.4 Data Collection in Data Science

Data Collection is the process of gathering information from relevant sources to find a solution to the given statistical inquiry.

Primary data collection methods:

Surveys and Questionnaires
Interviews
Observations
Experiments
Focus groups
Sensors
IoT devices
Biometric devices

Secondary data collection methods:

Published sources
Public databases
Government and institutional records
Surveys and Questionnaires conducted in the past
Social media data/posts
Publicly available data
Past research studies

4.2.5 Data Storage

Effective storage of data is essential for managing and analyzing large volumes of data.

Common data storage methods:

Relational/NoSQL databases
Data warehouse
Distributed file systems
Cloud-based data storage
Blockchain

4.2.6 Data Visualization

Data visualization is graphical representation of data to get meaningful insight, trends, and patterns from data. The visual elements include charts, graphs, maps, figures, and dashboards.

4.2.7 Summary statistics

Summary statistics provide information about the data in a sample. It helps understand the values better. It includes the total number of values, minimum value, and maximum value, along with the mean value and the standard deviation corresponding to a data collection.