Matheus Antunes
Portfolio

Data Analytics, SQL, Python, Tableau, Power BI
Linkedin.

Property Rent Calculator Complete Project

Using Selenium, Pandas, Matplotlib, Seaborn, Scikit-learn, and Streamlit

Complete project, starting with data collection from a real website, using web-scrapping techniques with Selenium. Data cleaning and preparation using Pandas, Numpy, with the help of some visualizations with Matplotlib and Seaborn. Using Scikit-learn regression algorithms, optimizing a Random Forest model. Using Streamlit to prepare the project for deployment, where new property data can be inputed via a website, and a prediction of rent value be returned as an output.

USA 2015 Flights Delays and Cancelations Analysis

Using Google BigQuery and Google Data Studio

Analysis of the USA 2015 Flights Dataset, which has almost 6 million rows. Applied some data cleaning techniques, using SQL via the Google BigQuery interface, to then produce Dashboard visualizations on Google Data Studio.

Machine Learning Exercises

Using Pandas, Matplotlib, Seaborn and Scikit-learn

Exercises of Classification and Regression, using Scikit-learn algorithms. Includes a churn predictor, using a bank dataset, using a Random Forest Classifier model. Also includes a House Value predictor, using the California House prices dataset, using a Random Forest Regressor model. Both exercises includes parameters optimization, using the Grid Search Cross-Validation method.

Python Data Visualization Exercises

Using Pandas, Matplotlib, and Seaborn

Exercises of Data Visualization using Python. Includes a Jupyter Notebook focused on Matplotlib visualizations, and another focused on Seaborn visualizations.

Titanic Survival Prediction

Using Pandas and Scikit-learn

Using machine learning in Python to predict survival of passengers. Uses labeled data to train the model, and running the trained model on hidden-label data to guess if these passengers survived, using the Gradient Boosting Classifier. The accuracy is given after submission on Kaggle.com, which was 78,2%. Further optimizing can be done yet to improve this model.

Movies Industry Dataset Analysis

Using Python/Pandas

Analysis of the Movies Industry dataset, applying some data cleaning and looking for correlations between the data features. Made using mostly Pandas, and some simple visualizations using Matplotlib and Seaborn.

Olist Dataset Dashboards

Using Tableau

Dashboards made using the Olist e-commerce dataset available on Kaggle.com, assuming a case where we want to reward the best sellers.