Using Selenium, Pandas, Matplotlib, Seaborn, Scikit-learn, and Streamlit
Complete project, starting with data collection from a real website, using web-scrapping techniques with Selenium. Data cleaning and preparation using Pandas, Numpy, with the help of some visualizations with Matplotlib and Seaborn. Using Scikit-learn regression algorithms, optimizing a Random Forest model. Using Streamlit to prepare the project for deployment, where new property data can be inputed via a website, and a prediction of rent value be returned as an output.
Analysis of the USA 2015 Flights Dataset, which has almost 6 million rows. Applied some data cleaning techniques, using SQL via the Google BigQuery interface, to then produce Dashboard visualizations on Google Data Studio.
Exercises of Classification and Regression, using Scikit-learn algorithms. Includes a churn predictor, using a bank dataset, using a Random Forest Classifier model. Also includes a House Value predictor, using the California House prices dataset, using a Random Forest Regressor model. Both exercises includes parameters optimization, using the Grid Search Cross-Validation method.
Exercises of Data Visualization using Python. Includes a Jupyter Notebook focused on Matplotlib visualizations, and another focused on Seaborn visualizations.
Using machine learning in Python to predict survival of passengers. Uses labeled data to train the model, and running the trained model on hidden-label data to guess if these passengers survived, using the Gradient Boosting Classifier. The accuracy is given after submission on Kaggle.com, which was 78,2%. Further optimizing can be done yet to improve this model.
Analysis of the Movies Industry dataset, applying some data cleaning and looking for correlations between the data features. Made using mostly Pandas, and some simple visualizations using Matplotlib and Seaborn.
Dashboards made using the Olist e-commerce dataset available on Kaggle.com, assuming a case where we want to reward the best sellers.
Repository of SQL code used to solve challenges, using the Olist e-commerce dataset. Made using SQLite on DB Browser.
Challenges taken from "10 Days of Statistics" on HackerRank.com, solved using Python on a notebook in Google Colab.
Repository of codes used to solve Python challenges taken from Awari's Data Science Course. Solved using Python on Google Colab notebooks.