I find myself in a midst of a global pandemic, stuck at home, looking for an opportunity to be in the data science/analytics field. With much time on my hands, I do my best to try to make the most of that. Part of time is spent working on my data science portfolio, to which this blog is a part of that. Another aspect of my data science portfolio is working on end-to-end data science projects. Sometimes, those projects or part of those projects is a classic exploratory data analysis (EDA) into a dataset.
In this blog post, I’ll be going over 5 points that I believe are essential in performing an exceptional exploratory data analysis at home for your portfolio.
That key thing is communication, and most importantly you’re doing this for yourself to improve your own skills.
Point 1: Communicate
You can show off all your cool code, your cool graphs, and cool complex models. However, it is not important if you cannot communicate properly. I make it point, to always have an introduction, a middle, and a conclusion to all my work.
At the beginning of your EDA project, always have a little introduction to your dataset and your goals with your analysis. Have a few hypothesis to what you think the data might tell you. You are setting the stage for this investigation to your dataset.
It is also important to go over the various features you deem important in the dataset. Explain what these features are, what they can do, and how it is important.
Point 2: Ask the right questions?
Ask the right questions for your dataset, start with a hypothesis, explore the relevant data features to those hypothesis. Perhaps with a plot of two, then answer those hypothesis. Did it validate? Did it not? Why so?
Go further than the general overview, make an assumption, make an insight. If the data is incomplete, attempt to explain and infer why the data tells you what it tells you.
Ask follow up questions and see if you can dig deeper into your analysis.
You want to be inquisitive, curious, and almost like a detective. Unearthing the secrets of the data. Doing so will make you better analyst.
Point 3: Be a Storyteller
Be a storyteller, like I said before you want to have an intro, middle, and end. Your EDA needs to flow consistently from the start to the end. It needs to be a story, it needs to be engaging, and it needs to have purpose.
Everyone does a standard EDA, with no direction to it. Except hoping to show case their technical skills to create such analysis, however, that is not everything. The important bit is the value that your analysis brings to the table, what are the key insights that makes you think and that makes a reader say “that’s really interesting.”
You want to go above and beyond and to try and captivate your audience. Your audience could be yourself, could be people on the same path trying to learn how to be better at doing data analytics, or perhaps an audience to experts in your dataset. It can be anything, pick one, and tell the story to them. Through your code, your communication, the questions, and plots.
Point 4: Sound Authentic
Be Authentic, no need to act as if you know everything because you don’t. Write in your own style and with great poise.
Point 5: Enjoy Your EDA
Work on datasets that interests you or that you may have niche knowledge that could make the story you are about to tell more interesting. No point if you find yourself struggling to do a meaningful analysis. I don’t mean give up when the dataset is a bit complicated to clean or interact with, do push through that. What I mean, if the dataset is boring, then you’ll be bored and you’re just wasting time through an unpleasant grind. Find a dataset that you’re interested in and curious to what secrets it may tell you.