Data/Text Mining • Solo Project

Predicting Housing Prices with Course Methods

Author: Daniel Phelps • Email: dphelps9693@floridapoly.edu

View Proposal (PDF) GitHub Repo Dataset (Kaggle)

Project Summary (Updated)

This project uses the Ames Housing dataset to (1) run EDA and clear visualizations, (2) reduce dimensions with PCA for structure and feature insight, (3) segment the market with k‑means (choosing K by average silhouette), and (4) mine association rules on discretized features to explain Low/Medium/High price bands. The scope is intentionally trimmed—no complex prediction models—so the analysis is interpretable, reproducible, and aligned with course topics.

Scope & Methods (What I will do)

EDA & Visualization: price distribution (log), area–price scatter, box plots by neighborhood and overall quality, correlation heatmap.
PCA: scree plot, 2‑D PCA map colored by price band; interpret top loadings.
Clustering: k‑means on standardized numerics (K=2–8), select K by average silhouette; create plain‑English cluster profiles (size, median price, typical area, common neighborhoods/quality).
Association Rules: Apriori / FP‑Growth on discretized features with RHS ∈ {High, Low} price band; report top rules by lift with support & confidence.

Milestones & Deliverables

Proposal: PDF below.
Checkpoint I: EDA complete; PCA plots; initial K selection; draft cluster profiles.
Checkpoint II: association rules complete; refined clusters.
Final: 9–11 page report (IEEE two‑column), slides, reproducible notebooks and figures in repo.

Dataset

Ames Housing (Kaggle): competition page

Proposal

Loading proposal PDF…

Contact

Student: Daniel Phelps

Email: dphelps9693@floridapoly.edu