Project Overview
This academic project applies multivariate statistical techniques to analyze wheat seed characteristics and identify the most suitable seed variety for cultivation through data-driven insights.
Dataset
- Source: Kaggle (Public Dataset)
- Seed Types: Kama, Rose, Canadian
- Attributes: Area, Perimeter, Compactness, Kernel Length, Kernel Width, Asymmetry Coefficient, Kernel Groove Length
Tools & Techniques
- R & SPSS
- Exploratory Data Analysis (EDA)
- Principal Component Analysis (PCA)
- Factor Analysis
- K-Means & Hierarchical Clustering
Key Observations
- Area and Perimeter showed strong positive correlation
- First two principal components explained 86% of total variance
- K-Means clustering with k = 3 produced optimal separation
- Canadian wheat seeds consistently formed the strongest cluster
Recommendation
Based on validated dimensionality reduction and clustering results, the Canadian wheat seed variety is recommended for cultivation due to superior physical structure and consistency across models.