Data Summary
Data Summary#
When choosing an optimal number of variables, it is important to avoid both over-fitting, i.e., low predictive power due to the inclusion of too many variables without statistical justification at the model identification stage, and under-fitting, i.e., poor performance in modeling both the training and test data due to the inclusion of too few variables. We fit the two model parts discussed above separately. Both model parts use stepwise regression, selecting from the explanatory variables described above. The binary part of the model, classifying positive-versus-zero outcomes, selects 40 significant variables and the positive part of the model, predicting diagnosis rates for counties classified as positive, selects 37 significant variables. While the two model parts are allowed to select different variables, a few of the variables are found significant for both model parts. For example, the number of opioid prescriptions, percentage of people participating in food stamp, percentage of population in juvenile facilities, the number of federally qualified health centers, knowledge of HIV status, the rate of HIV testing, HCV death, the rate of PrEP use, percentage of households receiving social security income, rate of illicit drug use, and linkage to HIV care are significant in classifying both positive-versus-zero outcomes and predicting the positive diagnosis rates.
To model new HIV diagnosis rates, we consider a variety of explanatory variables. For the full list of data and descriptions, see the following table.
Variable |
label |
Description |
Any transformation |
Source |
---|---|---|---|---|
FIPS |
FIPS |
fips county code |
None |
Census TIGER |
Year |
year |
year |
None |
Census TIGER |
PopEstimate |
annual resident population estimates |
annual resident population estimates |
None |
AHRF |
UnemploymentRate16p |
unemployment rate |
Unemployment Rate of persons 16 + |
None |
AHRF |
nDaysMeasuredAirQualityGood |
number of days measured air quality good |
number of days measured air quality good |
None |
AHRF |
xGoodAirQualityDays |
percentage of good air quality days |
percentage of good air quality days |
None |
AHRF |
NumberHHs |
number of households |
number of households |
None |
AHRF |
PercentUrbanPop |
percent urban population |
percent urban population |
None |
AHRF |
PercentUrbanHousingUnits |
percent urban housing units |
percent urban housing units |
None |
AHRF |
xFamiliesBelowPovertyLevel |
percentage of families below poverty |
percentage of families below poverty level |
None |
AHRF |
xDisabledle18CivilNoninstl |
percentage of disability for civilian noninstitutionalized population less than 18 |
percentage of disability for civilian noninstitutionalized population less than 18 years old (5 year average) |
None |
AHRF |
xDisabled1864CivilNoninstl |
percentage of disability for civilian noninstitutionalized 18-64 years old |
percentage of disability for civilian noninstitutionalized 18-64 years old population (5 year average) |
None |
AHRF |
xDisabled65pCivilNoninstl |
percentage of disability for civilian noninstitutionalized population older than 65 |
percentage of disability for civilian noninstitutionalized population older than 65 years old (5 year average) |
None |
AHRF |
xDidNotWork1664 |
percentage of people aged 16-64 who didn’t work |
percentage of people aged 16-64 who didn’t work |
None |
AHRF |
xAgricForestFishHuntMineWorkers |
percentage of Agricultural/Forestry/Fisheries/Hunting/Mining workers |
percentage of Agricultural/Forestry/Fisheries/Hunting/Mining workers |
None |
AHRF |
xConstructionWorkers |
percentage of Construction workers |
percentage of Construction workers |
None |
AHRF |
xEducHealthSocAssWorkers |
percentage of Educational Services, and Health Care and Social Assistance workers |
percentage of Educational Services, and Health Care and Social Assistance workers |
None |
AHRF |
xManufacturingWorkers |
percentage of Manufacturing workers |
percentage of Manufacturing workers |
None |
AHRF |
xWorkersinOtherIndustries |
percentage of workers in other industries |
percentage of workers in other industries |
None |
AHRF |
nWorkersMeantraveltime16p |
number of workers, mean travel time, 16 + |
number of workers, mean travel time, 16 + |
None |
AHRF |
xFemalesDivorced |
percent of females divorced |
percent of females divorced |
None |
AHRF |
5YrTotInfantMortRatele1yr |
total infant mortality |
5 year average total infant mortality rate, less than 1 year old |
None |
CDC NCHHSTP AtlasPlus |
syphilis_rate |
syphilis rate |
the number of syphilis cases per 100,000 population |
None |
CDC NCHHSTP AtlasPlus |
chlamydia_rate |
chlamydia rate |
the number of chlamydia cases per 10000 population |
none |
CDC NCHHSTP AtlasPlus |
gonorrhea_rate |
gonorrhea rate |
the number of gonorrhea cases per 10000 population |
none |
CDC NCHHSTP AtlasPlus |
knowledge |
knowledge |
percentage of diagnosed infections among persons living with HIV infection |
none |
CDC NCHHSTP AtlasPlus |
linkage |
linkage to care |
percentage of diagnosed infections with at least 1 CD4 or viral load test performed ≤1 month after diagnosis |
none |
CDC NCHHSTP AtlasPlus |
hiv_rate |
HIV rate |
the number of HIV cases per 100,000 population |
none |
CDC NCHHSTP AtlasPlus |
suppression_per |
viral suppression |
percentage of diagnosed infections with a viral load result of <200 copies /mL at the most recent viral load test during the previous year. |
None |
CDC NCHHSTP AtlasPlus |
hivmedic_per |
receipt of HIV medication |
percentage of diagnosed infections with ≥1 CD4 or viral load tests performed during the year. |
None |
CDC NCHHSTP AtlasPlus |
hcv |
HCV death |
deaths among persons with acute viral hepatitis C or chronic viral hepatitis C as an underlying cause of death |
sqrt(x) |
Hepvu |
opioid_prsc |
opioid prescriptions |
opioid prescriptions dispensed in the U.S. per 100 persons. |
/10 |
CDC opioid data |
hivtst |
HIV test |
HIV test |
/100 |
BRFSS |
PopMale1519_r |
percentage of male population at age group 15-19 |
percentage of male population at age group 15-19 |
None |
AHRF |
PopMale2024_r |
percentage of male population at age group 20-24 |
percentage of male population at age group 20-24 |
None |
AHRF |
PopMale2544_r |
percentage of male population at age group 25-44 |
percentage of male population at age group 25-44 |
None |
AHRF |
PopMale4564_r |
percentage of male population at age group 45-64 |
percentage of male population at age group 45-64 |
None |
AHRF |
PopMale65p_r |
percentage of male population at age group 65 + |
percentage of male population at age group 65 + |
None |
AHRF |
PopFemale1519_r |
percentage of female population at age group 15-19 |
percentage of female population at age group 15-19 |
None |
AHRF |
PopFemale2024_r |
percentage of female population at age group 20-24 |
percentage of female population at age group 20-24 |
None |
AHRF |
PopFemale2544_r |
percentage of female population at age group 25-44 |
percentage of female population at age group 25-44 |
None |
AHRF |
PopFemale4564_r |
percentage of female population at age group 45-64 |
percentage of female population at age group 45-64 |
None |
AHRF |
PopFemale65p_r |
percentage of female population at age group 65 + |
percentage of female population at age group 65 + |
None |
AHRF |
PopTotalMale_r |
percentage of the population as Male |
percentage of the population as Male |
None |
AHRF |
PopWhite_r |
percentage of White population |
percentage of White population |
None |
AHRF |
PopBlackAfrAm_r |
percentage of Black African American population |
percentage of Black African American population |
None |
AHRF |
PopAsian_r |
percentage of Asian population |
percentage of Asian population |
None |
AHRF |
PopTotHispLatino_r |
percentage of Hispanic population |
percentage of Hispanic population |
None |
AHRF |
PopAmerIndAlaskaNat_r |
percentage of American Indian/Alaska Native population |
percentage of American Indian/Alaska Native population |
None |
AHRF |
PopNatHawOthPacIsl_r |
percentage of Native Hawaiian/other Pacific Islander population |
percentage of Native Hawaiian/other Pacific Islander population |
None |
AHRF |
PopTwopRaces_r |
percentage of Two or more races population |
percentage of Two or more races population |
None |
AHRF |
MedicareEnrollmentAgedDisabledTot_r |
percentage of population on Medicare |
percentage of population on Medicare. The data include counts of Medicare beneficiaries with Medicare Part A which is also known as Hospital Insurance and Medicare Part B which is also known as Medical Insurance. |
None |
AHRF |
Persons1864HealthInsurance_r |
percentage of population between 18-64 with health insurance |
percentage of population between 18-64 with health insurance |
None |
AHRF |
Personsle19HealthInsurance_r |
percentage of population less than 19 years old with health insurance |
percentage of population less than 19 years old with health insurance |
None |
AHRF |
PersonsinPoverty_r |
percentage of Persons Age 0-17 in Poverty |
percentage of Persons Aged 0-17 in Poverty |
None |
AHRF |
FoodStampSNAPRecipientEstimate_r |
percentage of people as Food Stamp/SNAP Recipients |
percentage of people as Food Stamp/SNAP Recipients |
None |
AHRF |
nHousingUnitsEstimate_r |
number of Housing Unit per person |
number of Housing Unit per person |
None |
AHRF |
Persons25pYrsWleHSDiploma_r |
percentage of persons 25 years or older with less than high school diploma |
percentage of persons 25 years or older with less than high school diploma |
None |
AHRF |
Persons25pWHSDiplOrMore_r |
percentage of persons 25 years or older with a high school or more |
percentage of persons 25 years or older with a high school or more |
None |
AHRF |
nWorkersDriveAlone16p_r |
percentage of workers who drive alone to work |
percentage of workers who drive alone to work |
None |
AHRF |
nWorkersCarpool16p_r |
percentage of workers who use carpool to work |
percentage of workers who use carpool to work |
None |
AHRF |
nWorkersPublicTrans16p_r |
percentage of workers who use public transportation to go to work |
percentage of workers who use public transportation to go to work |
None |
AHRF |
nWorkersWalktoWork16p_r |
percentage of workers who walk to work |
percentage of workers who walk to work |
None |
AHRF |
nWorkersOtherMeansofTrans16p_r |
percentage of workers who use other means of transportation to work |
percentage of workers who use other means of transportation to work |
None |
AHRF |
nWorkersWorkatHome16p_r |
percentage of workers who work at home |
percentage of workers who work at home |
None |
AHRF |
nWorkersle5mintoWork16p_r |
percentage of workers with mean travel time of less than 5 minutes to work |
percentage of workers with mean travel time of less than 5 minutes to work |
None |
AHRF |
nWorkers59mintoWork16p_r |
percentage of workers with mean travel time of 5-9 minutes to work |
percentage of workers with mean travel time of 5-9 minutes to work |
None |
AHRF |
nWorkers1014mintoWork16p_r |
percentage of workers with mean travel time of 10-14 minutes to work |
percentage of workers with mean travel time of 10-14 minutes to work |
None |
AHRF |
nWorkers1519mintoWork16p_r |
percentage of workers with mean travel time of 15-20 minutes to work |
percentage of workers with mean travel time of 15-20 minutes to work |
None |
AHRF |
nWorkers2029mintoWork16p_r |
percentage of workers with mean travel time of 20-29 minutes to work |
percentage of workers with mean travel time of 20-29 minutes to work |
None |
AHRF |
nWorkers3044mintoWork16p_r |
percentage of workers with mean travel time of 30-44 minutes to work |
percentage of workers with mean travel time of 30-44 minutes to work |
None |
AHRF |
nWorkers4559mintoWork16p_r |
percentage of workers with mean travel time of 45-59 minutes to work |
percentage of workers with mean travel time of 45-59 minutes to work |
None |
AHRF |
nWorkers6089mintoWork16p_r |
percentage of workers with mean travel time of 60-89 minutes to work |
percentage of workers with mean travel time of 60-89 minutes to work |
None |
AHRF |
nWorkers90pmintoWork16p_r |
percentage of workers with mean travel time of more than 90 minutes to work |
percentage of workers with mean travel time of more than 90 minutes to work |
None |
AHRF |
nOccupiedHousingUnits_r |
number of occupied housing units |
number of occupied housing units per person |
None |
AHRF |
3YearIschemicHeartDisDeaths_n |
percentage of ischemic heart disease deaths |
percentage of ischemic heart disease deaths per person (3-year average) |
/100 |
AHRF |
MedianHHIncome_l |
Median Household Income |
Median Household Income |
log(1+x) |
AHRF |
PerCapitaPersonalIncome_l |
Per Capita Personal Income |
Per Capita Personal Income estimates |
log(1+x) |
NSDUH |
illicit_l |
illicit drug use |
the use of illicit drugs in the past month |
log(1+x) |
NSDUH |
marijuana_l |
Marijuana Use |
Marijuana Use in the Past Year |
log(1+x) |
NSDUH |
marijuana_init_l |
First Use of Marijuana |
First Use of Marijuana |
log(1+x) |
NSDUH |
cocaine_l |
Cocaine Use |
Cocaine Use in the Past Year |
log(1+x) |
NSDUH |
alcohol_l |
Alcohol Use |
Alcohol Use in the Past Month |
log(1+x) |
NSDUH |
tobacco_l |
Tobacco Product Use |
Tobacco Product Use in the Past Month |
log(1+x) |
NSDUH |
cigar_l |
Cigarette Use |
Cigarette Use in the Past Month |
log(1+x) |
NSDUH |
mental_severe_l |
Serious Mental Illness |
Serious Mental Illness in the Past Year |
log(1+x) |
NSDUH |
mental_l |
Any Mental Illness |
Any Mental Illness in the Past Year |
log(1+x) |
NSDUH |
suicide_l |
Had Serious Thoughts of Suicide |
Had Serious Thoughts of Suicide in the Past Year |
log(1+x) |
NSDUH |
MedianHomeValue_l |
median home value |
median home value |
log(1+x) |
AHRF |
MedianGrossRent_l |
median gross rent |
median gross rent |
log(1+x) |
AHRF |
HHIncomeUnder10000_l |
household income, less than $10,000 |
household income, less than $10,000 |
log(1+x) |
AHRF |
depress_l |
Major Depressive Episode |
Major Depressive Episode in the Past Year |
log(1+x) |
AHRF |
TotalHospitalBeds_r_l |
total hospital beds per person |
total hospital beds per person |
Log(1+1000*x) |
AHRF |
PhysNFPrimCarePatCareExclHspRsdnts_r_l |
number of Physicians, Excluding Hospital Residents |
number of Physicians, Non-Federal, Total Patient Care, Excluding Hospital Residents per person |
Log(1+10000*x) |
AHRF |
PhysNFPrimCarePatCareHospRsdnts_r_l |
number of Physicians, Hospital Residents per person |
number of Physicians, Non-Federal, Total Patient Care, Hospital Residents per person |
Log(1+10000*x) |
AHRF |
MDsNonFedTotGenPractTotal_r_l |
number of Medical Doctors, Total General Practice |
number of Medical Doctors, Non-Federal, Total General Practice (General Practice and Family Medicine and Subspecialties), Total per person |
Log(1+10000*x) |
AHRF |
MDsNFGenIntMedTotPatCare_r_l |
number of Medical Doctors, General Internal Medicine, |
number of Medical Doctors, Non-Federal, General Internal Medicine, Total Patient Care per person |
Log(1+10000*x) |
AHRF |
PhysicianAssistantswNPI_r_l |
number of Physician Assistants |
number of Physician Assistants (per person) with an NPI per person |
Log(1+10000*x) |
AHRF |
PopinJuvenilleFacilities_r_l |
number of populations in Juvenile Facilities |
percentage of population in Juvenile Facilities |
Log(1+10000*x) |
AHRF |
AdvPracticeRegisteredNurseswNPIAPRN_r_l |
number of Advanced Practice Nurse Midwives |
number of Advanced Practice Nurse Midwives, with CMS NPI per person |
Log(1+10000*x) |
AHRF |
NursePractitionerswNPI_r_l |
number of Nurse Practitioners |
number of Nurse Practitioners, with CMS NPI per person |
Log(1+10000*x) |
AHRF |
STGHospwEmergencyDepartment_r_l |
number of short-term general hospitals with emergency department |
number of short-term general hospitals with emergency department per person |
Log(1+100000*x) |
AHRF |
nHomeHealthAgencies_r_l |
number of Home Health Agencies |
number of Home Health Agencies per person |
Log(1+100000*x) |
AHRF |
nCommunityMentalHealthCtrs_r_l |
number of Community Mental Health centers |
number of Community Mental Health centers per person |
Log(1+100000*x) |
AHRF |
nFedQualifiedHealthCenters_r_l |
number of Federally Qualified Health Centers |
number of Federally Qualified Health Centers per person |
Log(1+100000*x) |
AHRF |
nCommunityHealthCenters_r_l |
number of Community Health |
number of Community Health, Centers Grantees Only per person |
Log(1+100000*x) |
AHRF |
nRuralHealthClinics_r_l |
number of Rural Health Clinics |
number of Rural Health Clinics per person |
Log(1+100000*x) |
AHRF |
nNHSCPrimaryCareSiteswProv_r_l |
number of NHSC Primary Care sites |
number of NHSC Primary Care sites per person |
Log(1+100000*x) |
AHRF |
nNHSCMentalHealthSiteswProv_r_l |
number of NHSC Mental Health sites |
number of NHSC Mental Health sites per person |
Log(1+100000*x) |
AHRF |
TotalNumberHospitals_r_l |
number of hospitals |
total number of hospitals per person |
Log(1+100000*x) |
AHRF |
AdvPracticeNurseMidwiveswNPI_r_l |
number of advanced Practice Registered Nurses |
number of advanced Practice Registered Nurses (APRN), with CMS NPI per person |
Log(1+100000*x) |
AHRF |
prep_rate_l |
PrEP use |
number of people who had at least one day of prescribed TDF/FTC for PrEP in a calendar year per 100,000 population |
log(1 + x) |
AIDSVu |
nHHs2Persons_r |
percentage of households with 2 persons |
percentage of households with 2 persons |
None |
AHRF |
nHHs3Persons_r |
percentage of households with 3 persons |
percentage of households with 3 persons |
None |
AHRF |
nHHs4Persons_r |
percentage of households with 4 persons |
percentage of households with 4 persons |
None |
AHRF |
nHHs5Persons_r |
percentage of households with 5 persons |
percentage of households with 5 persons |
None |
AHRF |
nHHs6ormorePersons_r |
percentage of households with 6 or more persons |
percentage of households with 6 or more persons |
None |
AHRF |
HHswSupplemntlSecurityInc_r |
percentage of Households with SSI |
percentage of Households with Supplemental Security Income (5 year average) |
None |
AHRF |
HHswPublicAssistanceInc_r |
percentage of Households with Public Assistance Income |
percentage of Households with Public Assistance Income (5 year average) |
None |
AHRF |
nSingleParentHHs_r |
percentage of single parent households |
percentage of single parent households |
None |
AHRF |
UnmarriedPartnerHHDiffSex_r |
percentage of households as unmarried partner with different sex |
percentage of households as unmarried partner with different sex |
None |
AHRF |
UnmarriedPartnerHHMale_r |
percentage of households as unmarried male partner |
percentage of households as unmarried male partner |
None |
AHRF |
UnmarriedPartnerHHFemale_r |
percentage of households as unmarried female partner |
percentage of households as unmarried female partner |
None |
AHRF |
StateName |
state name |
state name |
None |
Census TIGER |
State |
state |
state |
None |
Census TIGER |
CountyName |
name of county |
name of county |
None |
Census TIGER |
CountyNameStateAbbrev |
name of county with state abbreviation |
name of county with state abbreviation |
None |
Census TIGER |
FIPSStateCode |
two-digit state FIPS code |
two-digit state FIPS code |
None |
Census TIGER |
FIPSCountyCode |
three-digit county FIPS code |
three-digit county FIPS code |
None |
Census TIGER |
FederalRegionCode |
These are the codes for the ten Federal Regional Offices from the Department of Health and Human Services. |
These are the codes for the ten Federal Regional Offices from the Department of Health and Human Services. |
None |
Census TIGER |
CensusRegionCode |
The Census Region Codes and Names and Census Division Codes and Names were taken from the ORP HSA ACCESS System. |
The Census Region Codes and Names and Census Division Codes and Names were taken from the ORP HSA ACCESS System. |
None |
Census TIGER |
urbanizationizationlevel |
urbanization level |
urbanization level |
None |
AHRF |
MSM_count_l |
MSM counts |
MSM counts |
log(1 + x) |
Jones et al. (2018) |
Abbrevations in the Source column and their associated links.