Photo by Ben White on Unsplash

Data Science Bootcamps Don’t Just Teach Technical Skills

John Paul Hernandez Alcala
3 min readApr 5, 2021

--

Data science bootcamps such as Flatiron School are very open with what you will learn: Python/R, SQL, Statistics, A/B Testing, Machine Learning, Big Data, Deep Learning, etc. However, many will find it surprising that you will not just learn technical skills. Yes, the goal of these programs is to make you highly proficient very quickly. But, you are learning HOW to learn. The skills learned in these programs are just the lens you need to read and understand further details regarding the taught concepts, and this ultimately helps you for example build better models and fix error messages. There were many times in the development of my projects that I questioned, “was there a better way?”

Photo by Juan Rumimpunu on Unsplash

An example of this is when I learned about data imputation. The standard concept taught is that if you have categorical data with missing or outlier data, you should impute with the mode; for continuous data, you should impute with the mean or median depending on what appears to make the data more evenly distributed. Why does this matter? Because missing data or outliers can reduce the amount of data available for analyzing, bias your model and thus its predictive ability. Even so, using the above method still seems rather general to apply to every case. After some short Googling, I stumbled upon different articles that talked about KNNimputer from the scikit-learn class. Immediately upon reading the description, I found that it says it is a better approach than taking the naive approach of apply the mean or median of the total data because it uses k-nearest neighbors model to fill in those missing and outliers numbers with ones that are not missing in other samples and that also have similar values/features in the other columns. Even while writing for this article, I discovered two other methods that might even be better in some cases than KNNimputer!

In the end, I would not have had the ability to understand these better techniques without first having been equipped with the tools to decipher this information in the first place. The field of data science is all about continuously learning new things and playing around with them to discover how they work. Everything we learn on our data science journey is one more piece that we need to solve our original problem.

Photo by Ross Sneddon on Unsplash

--

--

John Paul Hernandez Alcala

An intraoperative neuromonitor who tinkers with data to see what interesting nuggets he can find.