How to Fill in Missing Values in a Dataset

JAYAMBE
3 min readJun 11, 2024

--

One of the most important abilities that any expert in data science should possess is data preparation. Choosing which measure to take into consideration when filling in the missing values in a dataset the mean, median, or mode is a difficult problem in data preparation. This article is for you if you want to understand how to determine the approach and fill in the missing information. I’ll explain how to add missing values to a dataset in this article.

https://colab.research.google.com/drive/1tLIEB7ybV1fLOCeIEQANr5N1PExIcfuL?usp=sharing

Here’s How to Choose Between Mean, Median, and Mode to Fill in Missing Values

Depending on the data you are working with, you may choose between using the Mean, Median, and Mode to fill in any missing values in a dataset. To assist you pick between the mean, median, and mode to fill in the missing values in a dataset, consider the following helpful guidelines:

  1. Mean: When your dataset is in a normal distribution, you can use mean to fill in the missing values.
  2. Median: When your dataset is not in a normal distribution, you can use the median value to fill in the missing values.
  3. Mode: When the missing values in your data are categorical and discrete, you can use the mode value to fill in the missing values.

So the first step is to check, has your data missing values??
If yes, your data has missing values then you need to check the distribution of each numerical variable (with missing values).
If the values in the numerical variables are missing, use the Mean value
if the variable is in a normal distribution.
Otherwise, choose Median.
if the variable is categorical or discrete, you can select mode.
So you need to choose a different measure for each variable

Now Here’s How to Fill in Missing Values in a Dataset

In order to use Mean, Median, and Mode to fill in the missing values, let’s first generate an example data set with some missing values:

Here’s how to fill in missing values using the mean value:

Here’s how to fill in missing values using the median value:

And now, here’s how to fill in missing values using the mode value:

Thus, you may complete any missing values in your data in this way.

In brief,
Finding any missing values in your data is the first step. If your data has missing values, you need to check the distribution of each numerical variable (with missing values).
If the values in the numerical variables are missing, use the Mean value if the variable is in a normal distribution.
Select Median if not.

You may also choose the mode if the variable is discrete or categorical.
Thus, you must select a distinct measure for every variable.
This post on adding missing values to your data should have been enjoyable for you. Please feel free to post insightful queries in the space below the comments.

--

--

No responses yet