NOTE: Skip to PySpark Installation Instructions in Google Colab if you are in a rush

The Not-So-Fun Waiting Game

You are working on a project in a Jupyter Notebook. You are exciting to see what comes out of this new analysis. Perhaps you read about a new modeling or data cleaning approach. You are progressing through the data lifecycle, but you find yourself sitting there waiting for that little asterisk to change back to normal which signifies the computation is complete.


OMDb API Page: http://www.omdbapi.com/

Recap

Last week, in Part 1, we learned how to use the OMDB API step-by-step and automate our requests to the API to mine for all the movie data we need. In Part 2 we will continue on to see how we can improve our defined function from Part 1 and capture more movies. Finally, we will integrate the new OMDb data with the movie budget dataframe.

The Messy Algorithm

From Part 1, we saw the below defined function:

def moviesDict(movieDetails, movieInfo2, OrgMovieTitle):
if len(movieDetails) > len(movieInfo2): #This fills in fields that might not have been present in template…


If you are rushed, just scroll down to OMDb API Step-by-Step

Motivation

About every weekend, my girlfriend and I head over to my parents’ house to have dinner together, visit, and watch a movie together. This weekend however, we had great difficulty finding a movie we all considered a good movie; we shifted through different recommendation lists we found online. We looked at three main features:

  • Rating Scores from Rotten Tomatoes and IMDb
  • Genre (because my mother cannot stomach horror and she is not the biggest fan of Sci-Fi)
  • Maturity Rating (used to avoid awkward explicit scenes)

As we checked each…


Photo by Jessica Pamp on Unsplash

While going through my GitHub profile, I ran across my first ever project from Flatiron School’s data science program; it was a rudimentary analysis of the tragedy, Macbeth, but I remember my feeling of accomplishment when I completed it and how exciting it was to see data science applied to something non-conventional. Below, I would like to revisit the project and take you on the journey that I embarked on more than 1 year ago.

NOTE: I will be using lists, dictionaries, conditionals, and matplotlib to visualize the data from the play, so be excited and prepared to see all…


Photo by Laura Chouette on Unsplash

When you hear MLA, you may remember using the MLA format for essays or literary work you did in your English classes from high school or college. “So, what does this have to do with data science?” Well, let us look at the reason why MLA was founded:

“Founded in 1883, the Modern Language Association of America provides opportunities for its members to share their scholarly findings and teaching experiences with colleagues and to discuss trends in the academy.”

Interesting, so now let us look at how IBM defines data science:

“Data science is a multidisciplinary approach to extracting actionable…


Photo by Ben White on Unsplash

Data science bootcamps such as Flatiron School are very open with what you will learn: Python/R, SQL, Statistics, A/B Testing, Machine Learning, Big Data, Deep Learning, etc. However, many will find it surprising that you will not just learn technical skills. Yes, the goal of these programs is to make you highly proficient very quickly. But, you are learning HOW to learn. The skills learned in these programs are just the lens you need to read and understand further details regarding the taught concepts, and this ultimately helps you for example build better models and fix error messages. …


Photo by Jonathan Kemper on Unsplash

Before enrolling in the Flatiron School’s Data Science course, I believed that after the 10 months of training and developing projects, I would be a master of everything data science. A level 99 data scientist some may say. As each project was completed, I felt more and more confident about my skills with data mining, cleaning, exploring, feature engineering, predictive modeling and visualization. In my head, each project was like a boss that I had to beat and with each “victory” a boost in experience. However, as the rush of getting stuff done by a deadline waned after graduation, I…


My first project through Flatiron School’s Data Science course is done, and I am extremely proud of not only with what I have done, but what I have learned!

When I was given my first project, I felt pretty overwhelmed by what was being asked of me: help Microsoft (hypothetical) better understand the movie industry, explore what type of films are currently doing the best at the box office, and translate my findings into actionable insights. …


Most of the problems that data scientists must solve involve how much or how many? (regression), which category? (classification), what’s wrong with the data? (anomaly detection), and would a user prefer this? (recommender systems). The first step to solving any of these problems is to determine what type of data is at hand. So, if I am asked to compare the number of companies with their associated revenues, which graph would I use?

A bar graph because I am comparing different companies, right? Not exactly. Let’s talk about it. Bar graphs allow comparisons across categories by presenting categorical data as…

John Paul Hernandez Alcala

An intraoperative neuromonitor who tinkers with data to see what interesting nuggets he can find.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store