Matt Hurless

Data Scientist | Bay Area, CA | (415) 669-4420 | mrhurless@gmail.com

I'm a solutions-oriented data scientist and natural-born problem solver who strives to use data analysis, visualization, and machine learning to help clients and companies tell compelling stories using data and understand deeper issues.

By leveraging my extensive background in customer support, I focus on tackling challenging problems that bridge gaps between people, products, and teams. My approach involves a mix of curiosity, emotional intelligence, patience, and collaboration.

Projects

Understanding Customer Satisfaction

Github Repo

A Random Forest classification model was built using real-world data from over 13,000 tickets collected using Zendesk APIs with the goal of understanding links between ticket characteristics and customer satisfaction ratings.

Data was cleaned in order to perform EDA, feature engineering, sentiment analysis using VADER, and modeling. Challenges of class imbalance and feature selection were explored using manual sampling techniques as well as SMOTE in conjunction with several feature selection techniques utilizing scikit learn tools to attempt to improve model performance.

Air Quality and Low Birth Weight

Github Repo

Worked with a small team of peer data scientists to collect county level birth data from the CDC and air quality data from the EPA to explore relationships between air quality and low birth weight.

Neural Network, Logistic Regression, and Random Forest classification models were investigated after data cleaning and EDA. Utilizing Grid Search to tune the Neural Network model, we were able to achieve a 17 point improvement over baseline to predict whether a county would experience a high rate of low birth weight based on air quality.

Reddit Post Classification

Github Repo

Cleaned and analyzed 10,000 posts from each of the Xbox and PlayStation subreddits as the basis for an NLP classification model.

Built and evaluated Decision Tree, Random Forest, K nearest neighbors, and logistic regression classification models. Employing grid search to optimize hyperparameters, the Random Forest model created was able to improve 30 points over a baseline accuracy of 52%.

Skills

Languages & Libraries

Python
Pandas
Scikit-learn
Numpy
Matplotlib
Seaborn
SQL
Numpy

Data Science Concepts

Machine Learning
Regression
Classification
Data Collection
Data Cleaning
Exploratory Data Analysis
Feature Engineering & Selection
Natural Language Processing

Tools

Jupyter Notebook / Jupyter Lab
Git
GitHub
Streamlit
VS Code
Slack
Zendesk
Salesforce
Microsoft Office

Education

General Assembly

Certificate in Data Science

November 2022 - February 2023

SAE Expression College

Bachelor of Arts in Sound Arts

November 2002 - May 2004

University of Wyoming

General Studies

August 2001 - June 2002

Interests

When I'm not at a computer, you might find me in my car out in the hills of California enjoying a spirited drive on some twisty roads. Exploring this wonderful state and taking in the sites behind the steering wheel of an enjoyable car is amazing--from the mountains to the coastline, there is a lot to take in.

Beyond motoring, I enjoy music of almost any genre (especially live shows), movies, TV, and trying to pretend like I'm a good cook and a sommelier, although I'm an amateur at best.