Academic Projects

Avocado Size Preference in US Cities: Principal Component Analysis (PCA) of Volumetric Sales of Avocados in 2020

The project focused on in-depth exploratory data analysis to uncover trends and preferences in avocado sizes across major US cities. By applying the PCA to volumetric sales data of avocados in three different sizes (small, medium, and large) in 2020 categorized by PLUs 4046, 4225, 4770, respectively, a distinct preference for either small or medium-sized avocados in most cities was revealed, with outliers showing higher sales in all three sizes. The project was implemented using Python, and the dataset was sourced from www.kaggle.com. I opted to analyze this dataset because of my personal fondness for avocados, especially as I include avocado toast in my daily breakfast routine. My preference for using the entire avocado for the toast and my reluctance to store cut avocados for the next day lead me to favor small-sized avocados. However, I occasionally encounter challenges in finding small-sized avocados sold under PLU 4046 at my usual grocery store. This curiosity prompted me to delve into the trends of preferences in avocado sizes in US cities. Read more

Gaussian Mixture Models (GMMs) Based Density Estimation of US Cities Preferring One Size of Avocados

After reducing the dimensionality of the data through PCA, the density distribution of the cities preferring small, medium, or all sizes of avocados, as represented by the score plot, was estimated using GMMs. The expectation maximization (EM) algorithm, an iterative scheme, was used to determine the model parameters. Consequently, the data depicted in the score plot was effectively clustered into three distinct clusters. These clusters represent cities with high sales of small-sized avocados, cities with high-medium sized avocado sales, and cities with outlier sales in large-sized avocados. The project was coded in Python. Read more

Dense Neural Network Based Categorization of Iris Data

The primary goal of this project was to thoroughly understand the core principles of neural networks, involving the implementation of four fundamental steps: initialization of the weights and biases, feedforward and backpropagation algorithms, and optimization using stochastic gradient descent. A small-scale project to classify Iris flowers into three classes based on specific characteristics was undertaken. A dense neural network for this task was developed and implemented in Python, utilizing a dataset from https://archive.ics.uci.edu/ml/datasets/Iris. The secondary objective was to gain hands-on experience with neural network design and optimization using the Keras library. Stochastic gradient descent (SGD) algorithm with a default learning rate of 0.01 was employed for optimizing the weights and biases. Read more

Classification of Iris Flower Based on Sepal Length and Petal Length Using Support Vector Machines (SVMs)

The goal was to comprehend the mathematical formulation of Support Vector Machines (SVM) as an optimization problem and acquire practical experience by implementing SVM using pre-built functions from the scikit-learn library. To achieve this objective, a project was undertaken to classify the Iris flower into one of three classes—Iris setosa, Iris versicolor, and Iris virginica—based on the sepal length and petal length of the flower. For this purpose, three SVMs were established, each employing distinct kernels and inequality constraints, effectively addressing the specified task. The design of the SVM and the optimization of the SVM were executed using the SVC function from scikit-learn, a machine learning library, and the results from the three configurations were compared and evaluated for accuracy. Read more