Model Description#

Our model aims to identify a simple and efficient approach for HIV prediction over all counties in the US. Due to the rareness of HIV and the confidentiality concern of health data, the county-level new HIV diagnosis rates contain a high percentage of zeros and suppressed data. We proposed to treat the data with a two-part model, one part for classifying zeros and the other part for making predictions given the county has a positive HIV diagnosis rate. For each part of the model, we explored multiple methods, some of them take into account the spatial correlation and some do not. We compared both the classification and prediction performance between different methods and found that the logistic regression for estimating the probability of positive rates in conjunction with a Generalized estimating equation (GEE) with log-link produced the best prediction results. A detailed description of the model is available at: https://www.sciencedirect.com/science/article/pii/S1877584521000356.