Photo by Laura Chouette on Unsplash

What do Modern Language Association (MLA) and Data Science Have in Common?

John Paul Hernandez Alcala
3 min readApr 12, 2021

--

When you hear MLA, you may remember using the MLA format for essays or literary work you did in your English classes from high school or college. “So, what does this have to do with data science?” Well, let us look at the reason why MLA was founded:

“Founded in 1883, the Modern Language Association of America provides opportunities for its members to share their scholarly findings and teaching experiences with colleagues and to discuss trends in the academy.”

Interesting, so now let us look at how IBM defines data science:

“Data science is a multidisciplinary approach to extracting actionable insights from the large and ever-increasing volumes of data collected and created by today’s organizations. Data science encompasses preparing data for analysis and processing, performing advanced data analysis, and presenting the results to reveal patterns and enable stakeholders to draw informed conclusions.

When comparing these two definitions, we see they both have commonality in showing findings or insights with the purpose of discussing trends or patterns. And similarly to how MLA has a style of documenting these findings and trends, so does data science and thus, typically, data science projects.

Photo by NEW DATA SERVICES on Unsplash

At Flatiron School, we learn the data science process or lifecycle; these often include the following:

  1. Business Understanding: Figure out a clear understanding of the problem before moving any further in the process; relevant questions are asked, objectives are identified, and the desired outcome is defined.
  2. Data Mining: Identify and collect data from multiple sources that will help us answer our questions, complete our objectives and get us to our outcome.
  3. Data Cleaning: Make the data comprehensive by fixing inconsistencies such as null/missing values and values that don’t fit the data as a whole. NOTE: here the analysis outcome can be affected significantly if inconsistencies are fixed with different methods such as the case with data imputation
  4. Data Exploration (or Exploratory Data Analysis): Create data visualizations that help highlight patterns and relationships and make the necessary hypotheses.
  5. Feature Engineering: Select and/or drop certain features and manipulate others to make them more meaningful than the raw data for analysis NOTE: this must be done with the questions, objectives and desired outcome in mind. For example, if you want to see how house price is affected by location, then you may need to ensure certain features are not too correlated.
  6. Predicting Modeling: Train models, evaluate their performance, computational cost, and use them to create predictions; this is where the balance of performance metric such as accuracy, recall, precision, F-score & specificity comes into play.
  7. Data visualization: Use visualizations to quickly and profoundly communicate insights with key stakeholders.

Also, not only is it important in MLA to cite references to strengthen or reinforce your opinion, but it is also important in a data science project to cite why certain approaches were conducted.

The above structure is the recommended way to outline a data science project. Sometimes, you may see these steps condensed to fewer steps; however, the purpose is still the same just as it is an in argumentative essay: share discovered insights and enable those impacted to act on those insights.

Photo by Clem Onojeghuo on Unsplash

--

--

John Paul Hernandez Alcala

An intraoperative neuromonitor who tinkers with data to see what interesting nuggets he can find.