Abstract
While housing price prediction is well-studied, the prediction of large-scale housing conditions remains underexplored due to data limitations. This paper addresses this gap by developing a machine-learning model to predict housing conditions across the United States. We integrated property-level data from the Warren Group with neighborhood characteristics from the U.S. Census Bureau's American Community Survey and trained three gradient-boosting algorithms: CatBoost, LightGBM, and XGBoost. Despite XGBoost's slightly higher balanced accuracy, CatBoost was selected as the best model due to its superior resistance to overfitting. The final model's predictions were aggregated to census tracts, ZIP code tabulation areas, and a 36.13 km(2) resolution hexagonal grid for national-scale spatial analysis. The resulting comprehensive dataset can serve as a valuable resource for researchers and practitioners to analyze the geography of housing quality with applications in urban planning, disaster management, community resilience, public health, and more.