Data Science Notes

4.2 Data Types in Data Science

In data science we can mainly classify data into two main types: qualitative(categorical) and quantitative(numeric).

Qualitative or Categorical data

Describes an object or a matter of quality that can be labeled or named. It cannot be represented in numerical form. Examples include colors, places, etc.

Types of Qualitative data:

  • Ordinal data
  • Nominal data

Quantitative or Numerical data

Numerical data that can be measured mathematically. Examples include height, weight, number of students in a school, etc.

Types of Quantitative data:

  • Discrete data
  • Continuous data

4.2.1 Dataset and Database

A dataset is a structured or organized collection of data, usually associated with a unique body of work. A database is an organized collection of data stored in multiple datasets or tables.

Relational Databases

Structured data in tables with rows and columns

Non-Relational Databases

Various data models like document, key-value, wide-column, and graph

4.2.3 Role of database in data science

Databases are crucial for managing and storing large amounts of data. They were introduced to manage and store large amounts of data effectively.

Key reasons for database popularity:

  • Better organized data structure
  • The dependence of data science on data

4.2.4 Data Collection in Data Science

Data Collection is the process of gathering information from relevant sources to find a solution to the given statistical inquiry.

Primary data collection methods:

  • Surveys and Questionnaires
  • Interviews
  • Observations
  • Experiments
  • Focus groups
  • Sensors
  • IoT devices
  • Biometric devices

Secondary data collection methods:

  • Published sources
  • Public databases
  • Government and institutional records
  • Surveys and Questionnaires conducted in the past
  • Social media data/posts
  • Publicly available data
  • Past research studies

4.2.5 Data Storage

Effective storage of data is essential for managing and analyzing large volumes of data.

Common data storage methods:

  • Relational/NoSQL databases
  • Data warehouse
  • Distributed file systems
  • Cloud-based data storage
  • Blockchain

4.2.6 Data Visualization

Data visualization is graphical representation of data to get meaningful insight, trends, and patterns from data. The visual elements include charts, graphs, maps, figures, and dashboards.

4.2.7 Summary statistics

Summary statistics provide information about the data in a sample. It helps understand the values better. It includes the total number of values, minimum value, and maximum value, along with the mean value and the standard deviation corresponding to a data collection.