Photo by David Clode on Unsplash

Preparing for a Technical Assessment that Requires Data Merging Tasks in Python

John Paul Hernandez Alcala

--

Motivation

Last week, I received an email invitation to participate in a technical assessment. In the email, I was provided some background information about the technical assessment before I had to take it. Basically, I was asked to perform a data merging task linking together different data, and the assessment would also take place on a website called HackerRank. I had the choice to use Python or any other language. In all my previous projects, I had done this before using Python; even so, I knew the smart thing to do in preparation for this assessment would be to review concat(), merge(), and join() from the Pandas package.

Photo by Jay Wennington on Unsplash

Resources Used for Review

The best place to review any functions from a particular package such as Pandas is to look up the official documentation. In most cases, the official documentation has many explanations and examples that will show you the ropes quickly and effectively. To review for this assessment, I looked at Merge, join, concatenate and compare. This documentation coupled with Pandas Joining and merging DataFrame : Exercises, Practice, Solution provided a great place to review and practice.

The Assessment

After doing all the above review, I came into the assessment very refreshed and confident about how to merge data. The assessment however had some surprises that were outside of only merging. I had to know how to import data from comma-separated values (CSV) files. I achieved this by importing the CSV files into Pandas dataframes which is achieved as shown below:

import pandas as pddataframe = pd.read_csv(‘[path to file]’)

Once the data has been imported into dataframes, it is important to check the structure of the dataframes (e.g. datatypes), and the format of the values in each column to look for any abnormalities or just get a feel on what type of data you are messing with. I used dataframe.info() which gives a concise summary of the dataframe and datarame.head() which gives the top 5 rows of the dataframe.

.info() Example from a Previous Project
.head() Example from a Previous Project

From doing the above, one example in my assessment is that I realized the two dataframes shared a common key column, but the values in the key columns were not exactly the same. The key column in one of the dataframes had to be cleaned; that is, there were prefix values that needed to be removed. In my particular case, the key columns were identical with the exception that one of them had numbers and letters that preceded the matching key. So key column values from dataframe 1 were formatted as ‘#*%$/1235–546-adbs’ and key column values from dataframe 2 were formatted as ‘1235–546-adbs’ I cleaned the column by using the below code that iterates through each row of the column and splits off the key from the rest of the preceding information from the ‘/’ and saves just the key value in the key_column:

dataframe1.loc[:, ‘key_column’] = [x.split(‘/’)[-1] for x in dataframe1[key_column]]

Once the dataframes had matching keys and all other cleaning was done, I merged my two dataframes how was asked, and uploaded my results to a master CSV file by doing the following:

master_dataframe.to_csv(‘[name_of_csv_file’], index=False)

And that is it! That is how I prepared for my technical assessment that had me perform data merging tasks. I hope this helps you as well whether it is just for edification or an upcoming technical assessment. Please check out my other blogs for more cool stuff.

Photo by Silvan Arnet on Unsplash

--

--

John Paul Hernandez Alcala
John Paul Hernandez Alcala

Written by John Paul Hernandez Alcala

An intraoperative neuromonitor who tinkers with data to see what interesting nuggets he can find.