What is Data?
In our everyday life, we often unconsciously see and use data. But can we tell what exactly is the concept of data and what is its importance in our life? Since the invention of computers, people have been using the term “data” to refer to it as computer information that is either transmitted or stored. But we all know that this is not the only type of data, instead there are many different types of data. So how do we define them?
Data can be a text or a number in bytes or bits inside the memory of an electronic device. It can even be any information or ideas inside a person’s mind.
The information which is gathered by measuring, observing, researching, and analyzing is collectively known as the data. It comprises figures, facts, names, numbers, or general descriptions of things. It can be organized and represented in the form of charts, graphs, or tables to help ease its analysis. Examples include the data that is collected by data scientists through the process of data mining. Some other examples of data are listed below-
List of diabetic patients in a city
The population of a country
Number of students in a class
Types of Data
Data may be classified as qualitative or quantitative. Once we understand the difference between qualitative data and quantitative data, it will become easy for us to know how to use them and where to use them. Depending on their attributes and characteristics data can be classified into two categories-
Qualitative data:
The qualitative data includes the observations that cannot be calculated or computed. Instead some attributes or characteristics are represented through this data. They represent descriptions that we may observe but cannot compute or calculate it. Examples of qualitative data include honesty, wisdom, intelligence, creativity, etc. These are the data of attributes and this sample is classified as qualitative. These data are more explanatory in nature than conclusive.
Quantitative Data:
The quantitative data can only be measured and they do not explain anything about the data. This is because these data include numbers that can be calculated. These can be represented numerically and mathematical calculations can be performed over them, unlike qualitative data.
For example, we can find out the total number of students who play indoor games and the total number of students who play outdoor games, or the number of students in the class who have scored above 40 marks in English. The information which will be retrieved by this data will be numerical.
Data Collection
Before collecting any data, we have to first know the problem statement i.e., why are we collecting the data? What kind of problem are we going to deal with and then decide how to solve the problem? We must know that data collection is a systematic way of gathering relevant information from different sources.
On the basis of the source of data collected, we can classify it as primary data and secondary data.
Primary Data
The first type of data is called primary data. We use primary data when we deal with a unique problem that has no previous research related to the topic. So, primary data collection is basically a totally new collection of data that will be collected for the very first time. The basic example of primary data can be the Census of India.
We can take another example. Suppose you want information about the average time spent by the employees in a cafeteria. For such information, there will be no public data available so you will have to run a survey yourself. You can take interviews with the employees or monitor them to see how much time they spend in a cafeteria. This will be your primary data.
Secondary Data
The second type of data is called secondary data. This data has previous research information, i.e., someone or many people have already researched the topic and have posted the data on the internet, articles, magazines, books, and so on. For example, data available on the Government of India.
Difference Between Primary Data and Secondary Data
Quality Checking of the Data
Once we have the data ready, we have to perform a quality check of the data before analyzing it. This is an important step that we usually ignore but we have to remember that bad quality of data can be misleading and also degrade our presentation of the data. So, a quality check of the data is very essential for our representation of the data.
Exploratory Analysis of Data
After the quality checking of the data, we can finally analyze it. Analyzation helps us to become more familiar with the topic in order to extract useful insights. Ignoring this step might generate inaccurate models and we might select insignificant variables in our model.
Representation of the Data
Now, this is the most interesting part. It’s like a cherry on the cake. All the efforts and time that we have spent in our research depend on how we represent it. If we have all the important information but fail to represent it beautifully, our data might turn out to be a boring even if it is very informative. That is why the representation of the data is very important. There are many ways we can represent our data such as bar graphs, pie charts, a flow chart, tables, etc.
Solved Examples
1. In class 8 there are 25 students who are good at sports, 16 are good at art and crafts, and 9 students are good at drama. In class 9 there are 22 students who are good at sports and 31 students are good at art and craft and 5 students are good at drama. In class 10 there are 12 students who are good at sports, 8 students are good at art and craft and only 3 students are good at drama. Represent the data in a table.
Solution There are 3 activities and 50 students, so the table representing the distribution of all the students in different activities can be represented as:
2. You are doing a survey on the most favorite type of movie and you get the following result:
Represent the data in a bar graph.
Solution The bar graph for the following table is:
(Image will be Uploaded Soon)
FAQs on Introduction to Data
1. What is the importance of data analytics?
Data analytics is a method to analyze data that helps to enhance the productivity of the data. It is important to extract useful insights from an enormous amount of data. Analyzing the data is important because the final report that is generated is based on this.
2. What is data mining?
Data mining is a method of generating new information by uncovering the hidden patterns and relationships. It is basically a discovery process that is also known as exploratory data analysis. It can be very useful in predicting future trends or to make important decisions.
3. What is meant by continuous and discrete data?
The data that consists of clear spaces between them is known as the discrete data while the data that has sequential information is known as the continuous data. Discrete data is considered to be countable while continuous data is measurable. A bar graph is used to accurately represent the discrete data, while to represent the continuous data, histograms or line graphs are used. There are distinct or separate values in discrete data while any value within the preferred range is included in the continuous data.
4. What are some of the differences between data and information?
Data is referred to as the facts which are unorganized and unrefined while information is referred to as the data which is processed and presented in a meaningful context. Data is independent of the information while information refers to the group of data that carries a logical meaning. The data is generally measured in bits and bytes while information is measured in units such as quantity, time, and so on.
5. What are some of the uses of data?
Data is a very vast term and it allows organizations to determine the cause of problems more effectively. The relationships between what is happening in different departments, locations, and systems can be visualized with the help of data. It is used in many different areas, including business management for revenue profits, sales data, stock prices, finance, forms of human organizational activities such as censuses of the number of people who are homeless, governance to measure crime rates, literacy rates, unemployment rates, and more.