My Projects

This is a list of some of my personal projects. By personal I mean coming from a non-job motivation, being a graduation project or a personal project.

QUICK LINKS

Human Resources Analytics
Exoplanets Project
Video Games Ratings and Sales Project
This Website
A\B Testing, Udacity's Enrollment Experiment
Visualising World Emissions
Identify Fraud in Enron Financial and Email Data
Determining Quality Factors in Portugese Wine
Montreal Open Street Map Data Wrangling (OpenStreetMap)
Titanic Survival Exploration
AutoPark
Face Recognition Using Moment Invariants

PROJECTS

2017
Human Resources Analytics
Summary

My Third Kaggle dataset. This dataset is for the employees of a company, given a few of their parameters like their last evaluation, their satisfaction level and their average monthly hours at work. The initial goal of the dataset is to model the employees that are likely to leave the company, but I have explored a little bit further the areas of how did the company evaluate the employees and what made the employees happy.

Task:

Exploratory data analysis

Model Building
Challenges:

The dataset had a few interesting features (Like clusters, or some strange trends), but generally there was no problems with the data itself.

Technologies Used:

Python Markdown Exploratory Data Analysis Data Visualization Machine Learning Modeling Reporting Numpy Pandas MatPlotLib Seaborn Scikit-Learn Jupyter Notebook
Links

Dataset Report: Jupyter Notebook (Kaggle Kernel)

BACK TO TOP
2017
Exoplanets
Summary

My second Kaggle dataset. This dataset describes the discovered planets out of our solar system, giving their known parameters (Like their radius, mass, orbit parameters..etc). My work over this dataset starts with a univariate analysis for the dataset, then it became more of a free-styling. I have explored the side of the exoplanets orbiting within their star's habitable zone, i.e. the region where there is a high probability of liquid water.

Task:

Exploratory data analysis

3D visualisation
Challenges:

The equations! I am not an astronomer, so I had to dig into research papers to understand how to calculate the circumstellar habitable zone (CHZ), star's total emitted energy, understanding planet formations..etc. But I have enjoyed the ride so much, defintely a very interesting dataset.

Technologies Used:

R Markdown Exploratory Data Analysis Data Visualization Reporting ggplot knitr RGL R Studio
Links

Exoplanets Reports: Markdown Report (Kaggle)
CHZ Planets: Markdown Report (Kaggle)

BACK TO TOP
2017
Video Games Ratings and Sales (VGChartz')
Summary

This datasetwas published on Kaggle, it comes from VGChartz. I have published a univariate and bivariate analysis on Kaggle, exploring trends in the dataset.

Task:

Univariate Analysis

Bivariate Analysis
Challenges:

Not really a technical challenge, but this was the my first interaction with the Kaggle platform. I was not sure about what to expect, but things went very well

Technologies Used:

Python Markdown Exploratory Data Analysis Data Visualisation Numpy Pandas MatPlotLib Seaborn Jupyter Notebook
Links

Part 1: Univariate Analysis (Kaggle)
Part 2: Bivariate Analysis (Kaggle)

BACK TO TOP
2017
This Website!
Summary

I wanted to create a personal website, to showcase my work and in the future start blogging through it as well. I am not a front-end developer by experience, so using a web-builder service was the initial plan. But on the other hand, I wanted a quick challenge that would entertain me for a couple of weeks.

Task:
- Coding the HTML, CSS and Javascript
- Writing the website content.
- Finding images labeled for reuse.
- Generate the background using CSS (I used angryTools), I recommend that you give it a try.
Challenges:

The obstacle I have tried to create to challenge myself was to create it using pure HTML,CSS and Javascript, so not even jquery was used. It took a bit of time to find out how to do some of the effects manually, but at the end it was a satisfying quick challenge.

Technologies Used:

HTML CSS JavaScript js Web Development Web Design
BACK TO TOP
2016
A\B Testing, Udacity's Enrollment Experiment
Summary

In this project, we were given a modified dataset for an experiment Udacity has performed a while ago. The experiment was based over the hypothesis that if Udacity can make students who chose the free trial think realistically if they will be able to finish the online degree by answering a questionnaire, the rate of students who continue enrollement past the free trial will increase.

Task:

AB Test Analysis

Hypothesis Testing
Recommending Changes and Follow-Up Experiment
Challenges:

Choosing the right metrics for the experiment, as not all metrics were suitable as invariant metrics.

Technologies Used:

Microsoft Excel Hypothesis Testing A\B Test Metric Choice
Links

Report: PDF Report

BACK TO TOP
2016
Visualising World Emissions (The World Bank)
Summary

In this project, I went for The World Bank dataset to explore the planet's emissions by country. After getting the data, exploring it a bit and settling about what message to deliver, I have used my web development skills and on top of it the D3 library to create an interactive web visualization for the data.

Task:
- Exploring the Dataset
- Data Development (Creating new metrics not available in the dataset like the "Emissions Productivity")
- Web Development
- Visualization Development
- Esthetics
Challenges:

This project was really fun and challenging at the same time. Starting from the questions to answer, preparing the world map json file for colouring, reshaping the data to be suitable for display, the sophisticated had-made visualizations like the spider chart and sunburst, and esthetic\accessibility decisions like colours (I had to add the colour gradient after knowing that the rainbow is not accessible for colour blind people), so everytime I thought that I was almost done with the project I would find myself adding another improvement!

Technologies Used:

HTML CSS Javascript JS Python Exploratory Data Analysis Data Visualization Results Communication Data Wrangling Web Development jQuery D3 Pandas Jupyter Notebook Sublime Text
Links

Project: Github Repo
Report: The Interactive Visualzation for You to Explore!

BACK TO TOP
2016
Identify Fraud in Enron Financial and Email Data
Summary

In this project, I had two datasets:

Financial Dataset (Employees income)

The Famous Enron Email Dataset

I have explored different machine learning algorithms to see how can we identify those who were involved in the Enron fraud in the late 1990's from their salary and stocks share.
The other part of the project was a natural language processing, trying to identify the persons of interest from their emails.

Task:

Data Cleaning

Exploratory Data Analysis

Model Building

Model Validation
Challenges:

It was hard figuring out which algorithm to use, so I ended up using a huge grid search for several learning algorithms to see which one will perform best. For the natural language processing part, the data cleaning and reshaping was probably the hardest part. For example, how to separate a message's body from a forwarded message? At the end, it was all about playing around with regular expressions.

Technologies Used:

Python Markdown Regular Expression Data Cleaning Data Structuring Exploratory Data Analysis Hypothesis Testing Model Building Numpy Pandas SciKitLearn SkLearn NLTK Jupyter Notebook
Links

Project: Github Repo
Report: PDF Report

BACK TO TOP
2016
Determining Quality Factors in Portugese Wine
Summary

In this project, I have explored an anonymized dataset made of almost 5000 Portuguese wine bottles, each have their chemical make-up and a rating provided by at least 3 wine experts. I have used R to explore how different chemicals affected the quality, but without performing any statistical tests.

Task:

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

Report Writing in Markdown
Challenges:

As my first time to use the R\RStudio, at the beginning of the project things were a little bit uncomfortable. I have always programmed in C inspired languages and Python, R's syntax seemed a little bit counter-intuitive. The RStudio too was a little buggy (it was version 0.9.xx, now we have 1.x) as well and crashed, which added some frustration. Now, I really enjoy doing the exploratory data analysis using R!

Technologies Used:

R Markdown Exploratory Data Analysis Data Visualization Reporting ggplot knitr R Studio
Links

Project: Github Repo
Report: R Markdown Generated HTML Report

BACK TO TOP
2016
Montreal Open Street Map Data Wrangling (OpenStreetMap)
Summary

This project was to get the Montreal's Open Street Map (http://www.openstreetmap.org/) data to clean it. After acquiring the data (a few hundreds of megabytes), I have reshaped it into a json based format before fedding it into a MongoDB nosql database. From there, I have used the database's Python bindings, and performed a variety data cleaning tasks like translation, removing wrong\corrupt data and standadizing entries.

Task:

Data Acquisition Through OSM

Data Reshaping

Data Cleaning

Database Side for MongoDB
Challenges:

The hardest problem for Montreal's entries is that half were written in English and the other half were written in French (Despite that there was some guidelines explaining how to del with the local language issue). I had to do some rudimentary translation. The remaining problems were still more or less related to the 'uncleanliness' of the data, and were mostly dealt with using regular expressions, like catching phone numbers with wrong number of digits, standardizing the formats for phone numbers, emails..etc.

Technologies Used:

Python MongoDB Javascript Regular Expressions Getting Data API Data Cleaning Data Reshaping MongoDB RegEx PyCharm Open Street Map OSM NoSQL
Links

Project: Github Repo
Report: PDF Report on Github

BACK TO TOP
2016
Titanic Survival Exploration (Kaggle)
Summary

This project was on the Titanic dataset, from Kaggle. The project was for Udacity's nanodegree. The dataset is about Titanic's passengers, and whither they have survived or not. I have explored the dataset thoroughly, then tested if traveling alone had an effect over the survival chance. My rational was that family members can fight for each other and eventually get each other on the rescue boats. Finally, I have tried to reassemble family members together, and grouped them by the families whose members have survived, those whose member have partially survived, and finally the families that perished.

Task:

Exploratory Data Analysis

Data Wrangling

Hypothesis Testing
Challenges:

As my first project using Pandas, in the beginning the concepts were only theoretical in my head. Also, finding an interesting question to answer (other than obvious ones) was challenging, yet enjoyable.

Technologies Used:

Python Markdown Exploratory Data Analysis EDA Data Visualisation Hypothesis Testing Data Wrangling Numpy Pandas MatPlotLib IPython Notebook
Links

Report: iPython Notebook HTML Report

BACK TO TOP
2015
D-Sign
Summary

This project was an American Sign Language to audible speech translator. We hacked together a forearm worn accelerometers called Myo, a fingers motion tracker called Leap Motion, Nuance's SDK and 3D printing to submit for the Wearhacks 2015 hackathon. Our project won the first prize, as well the community prize (The projects that received the most votes from the participants)

Task:

Improvise a way to get ASL input

Programming the Myo Armband through Python
Challenges:

It was a hackathon, so time was not our best friend, especially that our team met for the first time during the networking party of the first day at the hackathon. It was a sleepless two days, but the experience and the atmosphere were amazing!

Technologies Used:

Python Algorithm Implementation Software Engineering Machine Learning Nuance SDK Myo Armband SDK Leap Motion SDK Myo Armband 3D Printing Leap Motion
Links

Devpost Page: Hackathon Submission Page
Code: Github Repo

BACK TO TOP
2007
AutoPark
Summary

This was our graduation project for the Information Technology Institute's "Embedded Systems" track. The project was inspired from Valeo's Park4U®, which was more of a wizardry for us at the time. Three months later after this project, I was hired at Valeo.
The idea of the project was to parallel park a toy car autonomously. We have found a research paper by INRIA (Unfortunately I cannot find it now) that we have implemented using two microcontrollers. When the system is enabled, the car searches for a suitable spot to its right using its ultrasonic sensors, and when it finds one it would parallel park into it.

Task:

Device Drivers

Algorithm validation over Matlab®

Application Development

Testing
Challenges:

Hacking the RC car was the most challenging. For example, to know how long the car have moved, we needed an encoder to count the car's wheel revolutions. Next, implementing the code over a very limited ROM\RAM was very hard, we had to split the tasks over two, one for reading the sensors and controlling the car, and the other to compute the parking calculations with all the lookup tables needed to perform it.

Technologies Used:

Matlab® C Assembly Algorithm Implementation Software Engineering Real Time Systems Device Drivers Pic Microcontroller Atmel AVR Microcontroller
BACK TO TOP
2005
Face Recognition Using Moment Invariants
Summary

This project was my first use of neural networks, back in 2005. Its idea was to put the face with a blank background, then compute the image moments; which are immune to translation and rotation. Now when I look at it, I can see some naivety in the project implementation like not caring enough for getting enough samples to train the network well, but still this project was an excellent graduation project.

Task:
- Image processing (Edge detection, image segmentation..etc)
- To recognise the faces of people present in the input image from a database of known people.
- Train the neural network for the task.
- Create a graphical user interface for the project.
Challenges:

Generally, we were more driven by being excited for the idea rather than a true knowledge about how to properly implement such a system. At the time, I did not even know the term "Machine Learning", everything for me was simply A.I. .

Technologies Used:

Matlab® Machine Learning Neural Networks Image Processing Matlab® Image Processing Toolbox Matlab® GUI Neural Networks
BACK TO TOP