Exploratory Data Analysis using Python

JAYAMBE
4 min readJun 4, 2024

--

A Data Science concept known as exploratory data analysis (EDA) involves analyzing a dataset to find trends, relationships, and patterns within the data. It aids in our understanding of the facts in the dataset, directs us in making wise judgments, and helps us come up with solutions for actual business issues. This post is for you if you wish to comprehend exploratory data analysis in a practical sense. I’ll walk you through the Python implementation of exploratory data analysis in this article.

Exploratory Data Analysis using Python

I’ll use a dataset based on my Instagram reach to demonstrate how to utilize Python for exploratory data analysis.

Let’s now examine the first five rows of the information:

Let’s now examine each of the columns in the dataset:

Let’s now examine the information in the column:

Next, we examine the data’s descriptive statistics:

Now, always check to see whether your data has any missing values before continuing:

Luckily, this dataset doesn’t have any missing values.

Always begin your data exploration by delving into the primary aspect of your data. For instance, if we are developing a dataset based on Instagram Reach, we ought to investigate the feature that offers reach-related information first. Our data’s Impressions column includes information on an Instagram post’s reach. Now let’s see how the Impressions are distributed:

Let’s now examine the total amount of impressions received by each post throughout time:

Let’s now see all of the data from each post over time, including Likes, Saves, and Follows.

Now let’s have a look at the distribution of reach from different sources:

Let’s now examine the distribution of sources of engagement:

Let’s now examine the correlation between the number of profile visits and the following:

Using a wordcloud, let’s now examine the kinds of hashtags that were used in the posts:

Let’s now examine the relationship between each feature individually:

Let’s take a closer look at the hashtag column now. Instagram reach is affected by the various hashtag combinations used in each post. So let’s examine the hashtag distribution to see which hashtag appears most frequently across all posts:

Now let’s have a look at the distribution of likes and impressions received from the presence of each hashtag on the post:

Thus, this is how you can use Python to do exploratory data analysis. The type of data you are working with will determine what kind of graphs you should use to explore it. I hope you now have a solid understanding of how to use Python for EDA.

https://colab.research.google.com/drive/1Yc_S_s5coc7UuPbyHb_0O3PoFF2QnliX?usp=sharing

In nutshell,
A Data Science concept known as exploratory data analysis (EDA) involves analyzing a dataset to find relationships, trends, and patterns within the data. It helps in our understanding of the facts in the dataset, directs us in making wise judgments, and helps us come up with solutions for actual business issues. This Python essay on exploratory data analysis is something I hope you enjoyed. Please feel free to post insightful queries in the space below the comments.

--

--

No responses yet