Data/Text Mining • Solo Project
Predicting Housing Prices with Course Methods
Author: Daniel Phelps • Email: dphelps9693@floridapoly.edu
Project Summary (Updated)
This project uses the Ames Housing dataset to (1) run EDA and clear visualizations, (2) reduce dimensions with PCA for structure and feature insight, (3) segment the market with k‑means (choosing K by average silhouette), and (4) mine association rules on discretized features to explain Low/Medium/High price bands. The scope is intentionally trimmed—no complex prediction models—so the analysis is interpretable, reproducible, and aligned with course topics.
Scope & Methods (What I will do)
- EDA & Visualization: price distribution (log), area–price scatter, box plots by neighborhood and overall quality, correlation heatmap.
- PCA: scree plot, 2‑D PCA map colored by price band; interpret top loadings.
- Clustering: k‑means on standardized numerics (K=2–8), select K by average silhouette; create plain‑English cluster profiles (size, median price, typical area, common neighborhoods/quality).
- Association Rules: Apriori / FP‑Growth on discretized features with RHS ∈ {High, Low} price band; report top rules by lift with support & confidence.
Milestones & Deliverables
- Proposal: PDF below.
- Checkpoint I: EDA complete; PCA plots; initial K selection; draft cluster profiles.
- Checkpoint II: association rules complete; refined clusters.
- Final: 9–11 page report (IEEE two‑column), slides, reproducible notebooks and figures in repo.
Dataset
Ames Housing (Kaggle): competition page
Proposal
Loading proposal PDF…
Contact
Student: Daniel Phelps
Email: dphelps9693@floridapoly.edu