Data Summary#

When choosing an optimal number of variables, it is important to avoid both over-fitting, i.e., low predictive power due to the inclusion of too many variables without statistical justification at the model identification stage, and under-fitting, i.e., poor performance in modeling both the training and test data due to the inclusion of too few variables. We fit the two model parts discussed above separately. Both model parts use stepwise regression, selecting from the explanatory variables described above. The binary part of the model, classifying positive-versus-zero outcomes, selects 40 significant variables and the positive part of the model, predicting diagnosis rates for counties classified as positive, selects 37 significant variables. While the two model parts are allowed to select different variables, a few of the variables are found significant for both model parts. For example, the number of opioid prescriptions, percentage of people participating in food stamp, percentage of population in juvenile facilities, the number of federally qualified health centers, knowledge of HIV status, the rate of HIV testing, HCV death, the rate of PrEP use, percentage of households receiving social security income, rate of illicit drug use, and linkage to HIV care are significant in classifying both positive-versus-zero outcomes and predicting the positive diagnosis rates.

To model new HIV diagnosis rates, we consider a variety of explanatory variables. For the full list of data and descriptions, see the following table.

Variable

label

Description

Any transformation

Source

FIPS

FIPS

fips county code

None

Census TIGER

Year

year

year

None

Census TIGER

PopEstimate

annual resident population estimates

annual resident population estimates

None

AHRF

UnemploymentRate16p

unemployment rate

Unemployment Rate of persons 16 +

None

AHRF

nDaysMeasuredAirQualityGood

number of days measured air quality good

number of days measured air quality good

None

AHRF

xGoodAirQualityDays

percentage of good air quality days

percentage of good air quality days

None

AHRF

NumberHHs

number of households

number of households

None

AHRF

PercentUrbanPop

percent urban population

percent urban population

None

AHRF

PercentUrbanHousingUnits

percent urban housing units

percent urban housing units

None

AHRF

xFamiliesBelowPovertyLevel

percentage of families below poverty

percentage of families below poverty level

None

AHRF

xDisabledle18CivilNoninstl

percentage of disability for civilian noninstitutionalized population less than 18

percentage of disability for civilian noninstitutionalized population less than 18 years old (5 year average)

None

AHRF

xDisabled1864CivilNoninstl

percentage of disability for civilian noninstitutionalized 18-64 years old

percentage of disability for civilian noninstitutionalized 18-64 years old population (5 year average)

None

AHRF

xDisabled65pCivilNoninstl

percentage of disability for civilian noninstitutionalized population older than 65

percentage of disability for civilian noninstitutionalized population older than 65 years old (5 year average)

None

AHRF

xDidNotWork1664

percentage of people aged 16-64 who didn’t work

percentage of people aged 16-64 who didn’t work

None

AHRF

xAgricForestFishHuntMineWorkers

percentage of Agricultural/Forestry/Fisheries/Hunting/Mining workers

percentage of Agricultural/Forestry/Fisheries/Hunting/Mining workers

None

AHRF

xConstructionWorkers

percentage of Construction workers

percentage of Construction workers

None

AHRF

xEducHealthSocAssWorkers

percentage of Educational Services, and Health Care and Social Assistance workers

percentage of Educational Services, and Health Care and Social Assistance workers

None

AHRF

xManufacturingWorkers

percentage of Manufacturing workers

percentage of Manufacturing workers

None

AHRF

xWorkersinOtherIndustries

percentage of workers in other industries

percentage of workers in other industries

None

AHRF

nWorkersMeantraveltime16p

number of workers, mean travel time, 16 +

number of workers, mean travel time, 16 +

None

AHRF

xFemalesDivorced

percent of females divorced

percent of females divorced

None

AHRF

5YrTotInfantMortRatele1yr

total infant mortality

5 year average total infant mortality rate, less than 1 year old

None

CDC NCHHSTP AtlasPlus

syphilis_rate

syphilis rate

the number of syphilis cases per 100,000 population

None

CDC NCHHSTP AtlasPlus

chlamydia_rate

chlamydia rate

the number of chlamydia cases per 10000 population

none

CDC NCHHSTP AtlasPlus

gonorrhea_rate

gonorrhea rate

the number of gonorrhea cases per 10000 population

none

CDC NCHHSTP AtlasPlus

knowledge

knowledge

percentage of diagnosed infections among persons living with HIV infection

none

CDC NCHHSTP AtlasPlus

linkage

linkage to care

percentage of diagnosed infections with at least 1 CD4 or viral load test performed ≤1 month after diagnosis

none

CDC NCHHSTP AtlasPlus

hiv_rate

HIV rate

the number of HIV cases per 100,000 population

none

CDC NCHHSTP AtlasPlus

suppression_per

viral suppression

percentage of diagnosed infections with a viral load result of <200 copies /mL at the most recent viral load test during the previous year.

None

CDC NCHHSTP AtlasPlus

hivmedic_per

receipt of HIV medication

percentage of diagnosed infections with ≥1 CD4 or viral load tests performed during the year.

None

CDC NCHHSTP AtlasPlus

hcv

HCV death

deaths among persons with acute viral hepatitis C or chronic viral hepatitis C as an underlying cause of death

sqrt(x)

Hepvu

opioid_prsc

opioid prescriptions

opioid prescriptions dispensed in the U.S. per 100 persons.

/10

CDC opioid data

hivtst

HIV test

HIV test

/100

BRFSS

PopMale1519_r

percentage of male population at age group 15-19

percentage of male population at age group 15-19

None

AHRF

PopMale2024_r

percentage of male population at age group 20-24

percentage of male population at age group 20-24

None

AHRF

PopMale2544_r

percentage of male population at age group 25-44

percentage of male population at age group 25-44

None

AHRF

PopMale4564_r

percentage of male population at age group 45-64

percentage of male population at age group 45-64

None

AHRF

PopMale65p_r

percentage of male population at age group 65 +

percentage of male population at age group 65 +

None

AHRF

PopFemale1519_r

percentage of female population at age group 15-19

percentage of female population at age group 15-19

None

AHRF

PopFemale2024_r

percentage of female population at age group 20-24

percentage of female population at age group 20-24

None

AHRF

PopFemale2544_r

percentage of female population at age group 25-44

percentage of female population at age group 25-44

None

AHRF

PopFemale4564_r

percentage of female population at age group 45-64

percentage of female population at age group 45-64

None

AHRF

PopFemale65p_r

percentage of female population at age group 65 +

percentage of female population at age group 65 +

None

AHRF

PopTotalMale_r

percentage of the population as Male

percentage of the population as Male

None

AHRF

PopWhite_r

percentage of White population

percentage of White population

None

AHRF

PopBlackAfrAm_r

percentage of Black African American population

percentage of Black African American population

None

AHRF

PopAsian_r

percentage of Asian population

percentage of Asian population

None

AHRF

PopTotHispLatino_r

percentage of Hispanic population

percentage of Hispanic population

None

AHRF

PopAmerIndAlaskaNat_r

percentage of American Indian/Alaska Native population

percentage of American Indian/Alaska Native population

None

AHRF

PopNatHawOthPacIsl_r

percentage of Native Hawaiian/other Pacific Islander population

percentage of Native Hawaiian/other Pacific Islander population

None

AHRF

PopTwopRaces_r

percentage of Two or more races population

percentage of Two or more races population

None

AHRF

MedicareEnrollmentAgedDisabledTot_r

percentage of population on Medicare

percentage of population on Medicare. The data include counts of Medicare beneficiaries with Medicare Part A which is also known as Hospital Insurance and Medicare Part B which is also known as Medical Insurance.

None

AHRF

Persons1864HealthInsurance_r

percentage of population between 18-64 with health insurance

percentage of population between 18-64 with health insurance

None

AHRF

Personsle19HealthInsurance_r

percentage of population less than 19 years old with health insurance

percentage of population less than 19 years old with health insurance

None

AHRF

PersonsinPoverty_r

percentage of Persons Age 0-17 in Poverty

percentage of Persons Aged 0-17 in Poverty

None

AHRF

FoodStampSNAPRecipientEstimate_r

percentage of people as Food Stamp/SNAP Recipients

percentage of people as Food Stamp/SNAP Recipients

None

AHRF

nHousingUnitsEstimate_r

number of Housing Unit per person

number of Housing Unit per person

None

AHRF

Persons25pYrsWleHSDiploma_r

percentage of persons 25 years or older with less than high school diploma

percentage of persons 25 years or older with less than high school diploma

None

AHRF

Persons25pWHSDiplOrMore_r

percentage of persons 25 years or older with a high school or more

percentage of persons 25 years or older with a high school or more

None

AHRF

nWorkersDriveAlone16p_r

percentage of workers who drive alone to work

percentage of workers who drive alone to work

None

AHRF

nWorkersCarpool16p_r

percentage of workers who use carpool to work

percentage of workers who use carpool to work

None

AHRF

nWorkersPublicTrans16p_r

percentage of workers who use public transportation to go to work

percentage of workers who use public transportation to go to work

None

AHRF

nWorkersWalktoWork16p_r

percentage of workers who walk to work

percentage of workers who walk to work

None

AHRF

nWorkersOtherMeansofTrans16p_r

percentage of workers who use other means of transportation to work

percentage of workers who use other means of transportation to work

None

AHRF

nWorkersWorkatHome16p_r

percentage of workers who work at home

percentage of workers who work at home

None

AHRF

nWorkersle5mintoWork16p_r

percentage of workers with mean travel time of less than 5 minutes to work

percentage of workers with mean travel time of less than 5 minutes to work

None

AHRF

nWorkers59mintoWork16p_r

percentage of workers with mean travel time of 5-9 minutes to work

percentage of workers with mean travel time of 5-9 minutes to work

None

AHRF

nWorkers1014mintoWork16p_r

percentage of workers with mean travel time of 10-14 minutes to work

percentage of workers with mean travel time of 10-14 minutes to work

None

AHRF

nWorkers1519mintoWork16p_r

percentage of workers with mean travel time of 15-20 minutes to work

percentage of workers with mean travel time of 15-20 minutes to work

None

AHRF

nWorkers2029mintoWork16p_r

percentage of workers with mean travel time of 20-29 minutes to work

percentage of workers with mean travel time of 20-29 minutes to work

None

AHRF

nWorkers3044mintoWork16p_r

percentage of workers with mean travel time of 30-44 minutes to work

percentage of workers with mean travel time of 30-44 minutes to work

None

AHRF

nWorkers4559mintoWork16p_r

percentage of workers with mean travel time of 45-59 minutes to work

percentage of workers with mean travel time of 45-59 minutes to work

None

AHRF

nWorkers6089mintoWork16p_r

percentage of workers with mean travel time of 60-89 minutes to work

percentage of workers with mean travel time of 60-89 minutes to work

None

AHRF

nWorkers90pmintoWork16p_r

percentage of workers with mean travel time of more than 90 minutes to work

percentage of workers with mean travel time of more than 90 minutes to work

None

AHRF

nOccupiedHousingUnits_r

number of occupied housing units

number of occupied housing units per person

None

AHRF

3YearIschemicHeartDisDeaths_n

percentage of ischemic heart disease deaths

percentage of ischemic heart disease deaths per person (3-year average)

/100

AHRF

MedianHHIncome_l

Median Household Income

Median Household Income

log(1+x)

AHRF

PerCapitaPersonalIncome_l

Per Capita Personal Income

Per Capita Personal Income estimates

log(1+x)

NSDUH

illicit_l

illicit drug use

the use of illicit drugs in the past month

log(1+x)

NSDUH

marijuana_l

Marijuana Use

Marijuana Use in the Past Year

log(1+x)

NSDUH

marijuana_init_l

First Use of Marijuana

First Use of Marijuana

log(1+x)

NSDUH

cocaine_l

Cocaine Use

Cocaine Use in the Past Year

log(1+x)

NSDUH

alcohol_l

Alcohol Use

Alcohol Use in the Past Month

log(1+x)

NSDUH

tobacco_l

Tobacco Product Use

Tobacco Product Use in the Past Month

log(1+x)

NSDUH

cigar_l

Cigarette Use

Cigarette Use in the Past Month

log(1+x)

NSDUH

mental_severe_l

Serious Mental Illness

Serious Mental Illness in the Past Year

log(1+x)

NSDUH

mental_l

Any Mental Illness

Any Mental Illness in the Past Year

log(1+x)

NSDUH

suicide_l

Had Serious Thoughts of Suicide

Had Serious Thoughts of Suicide in the Past Year

log(1+x)

NSDUH

MedianHomeValue_l

median home value

median home value

log(1+x)

AHRF

MedianGrossRent_l

median gross rent

median gross rent

log(1+x)

AHRF

HHIncomeUnder10000_l

household income, less than $10,000

household income, less than $10,000

log(1+x)

AHRF

depress_l

Major Depressive Episode

Major Depressive Episode in the Past Year

log(1+x)

AHRF

TotalHospitalBeds_r_l

total hospital beds per person

total hospital beds per person

Log(1+1000*x)

AHRF

PhysNFPrimCarePatCareExclHspRsdnts_r_l

number of Physicians, Excluding Hospital Residents

number of Physicians, Non-Federal, Total Patient Care, Excluding Hospital Residents per person

Log(1+10000*x)

AHRF

PhysNFPrimCarePatCareHospRsdnts_r_l

number of Physicians, Hospital Residents per person

number of Physicians, Non-Federal, Total Patient Care, Hospital Residents per person

Log(1+10000*x)

AHRF

MDsNonFedTotGenPractTotal_r_l

number of Medical Doctors, Total General Practice

number of Medical Doctors, Non-Federal, Total General Practice (General Practice and Family Medicine and Subspecialties), Total per person

Log(1+10000*x)

AHRF

MDsNFGenIntMedTotPatCare_r_l

number of Medical Doctors, General Internal Medicine,

number of Medical Doctors, Non-Federal, General Internal Medicine, Total Patient Care per person

Log(1+10000*x)

AHRF

PhysicianAssistantswNPI_r_l

number of Physician Assistants

number of Physician Assistants (per person) with an NPI per person

Log(1+10000*x)

AHRF

PopinJuvenilleFacilities_r_l

number of populations in Juvenile Facilities

percentage of population in Juvenile Facilities

Log(1+10000*x)

AHRF

AdvPracticeRegisteredNurseswNPIAPRN_r_l

number of Advanced Practice Nurse Midwives

number of Advanced Practice Nurse Midwives, with CMS NPI per person

Log(1+10000*x)

AHRF

NursePractitionerswNPI_r_l

number of Nurse Practitioners

number of Nurse Practitioners, with CMS NPI per person

Log(1+10000*x)

AHRF

STGHospwEmergencyDepartment_r_l

number of short-term general hospitals with emergency department

number of short-term general hospitals with emergency department per person

Log(1+100000*x)

AHRF

nHomeHealthAgencies_r_l

number of Home Health Agencies

number of Home Health Agencies per person

Log(1+100000*x)

AHRF

nCommunityMentalHealthCtrs_r_l

number of Community Mental Health centers

number of Community Mental Health centers per person

Log(1+100000*x)

AHRF

nFedQualifiedHealthCenters_r_l

number of Federally Qualified Health Centers

number of Federally Qualified Health Centers per person

Log(1+100000*x)

AHRF

nCommunityHealthCenters_r_l

number of Community Health

number of Community Health, Centers Grantees Only per person

Log(1+100000*x)

AHRF

nRuralHealthClinics_r_l

number of Rural Health Clinics

number of Rural Health Clinics per person

Log(1+100000*x)

AHRF

nNHSCPrimaryCareSiteswProv_r_l

number of NHSC Primary Care sites

number of NHSC Primary Care sites per person

Log(1+100000*x)

AHRF

nNHSCMentalHealthSiteswProv_r_l

number of NHSC Mental Health sites

number of NHSC Mental Health sites per person

Log(1+100000*x)

AHRF

TotalNumberHospitals_r_l

number of hospitals

total number of hospitals per person

Log(1+100000*x)

AHRF

AdvPracticeNurseMidwiveswNPI_r_l

number of advanced Practice Registered Nurses

number of advanced Practice Registered Nurses (APRN), with CMS NPI per person

Log(1+100000*x)

AHRF

prep_rate_l

PrEP use

number of people who had at least one day of prescribed TDF/FTC for PrEP in a calendar year per 100,000 population

log(1 + x)

AIDSVu

nHHs2Persons_r

percentage of households with 2 persons

percentage of households with 2 persons

None

AHRF

nHHs3Persons_r

percentage of households with 3 persons

percentage of households with 3 persons

None

AHRF

nHHs4Persons_r

percentage of households with 4 persons

percentage of households with 4 persons

None

AHRF

nHHs5Persons_r

percentage of households with 5 persons

percentage of households with 5 persons

None

AHRF

nHHs6ormorePersons_r

percentage of households with 6 or more persons

percentage of households with 6 or more persons

None

AHRF

HHswSupplemntlSecurityInc_r

percentage of Households with SSI

percentage of Households with Supplemental Security Income (5 year average)

None

AHRF

HHswPublicAssistanceInc_r

percentage of Households with Public Assistance Income

percentage of Households with Public Assistance Income (5 year average)

None

AHRF

nSingleParentHHs_r

percentage of single parent households

percentage of single parent households

None

AHRF

UnmarriedPartnerHHDiffSex_r

percentage of households as unmarried partner with different sex

percentage of households as unmarried partner with different sex

None

AHRF

UnmarriedPartnerHHMale_r

percentage of households as unmarried male partner

percentage of households as unmarried male partner

None

AHRF

UnmarriedPartnerHHFemale_r

percentage of households as unmarried female partner

percentage of households as unmarried female partner

None

AHRF

StateName

state name

state name

None

Census TIGER

State

state

state

None

Census TIGER

CountyName

name of county

name of county

None

Census TIGER

CountyNameStateAbbrev

name of county with state abbreviation

name of county with state abbreviation

None

Census TIGER

FIPSStateCode

two-digit state FIPS code

two-digit state FIPS code

None

Census TIGER

FIPSCountyCode

three-digit county FIPS code

three-digit county FIPS code

None

Census TIGER

FederalRegionCode

These are the codes for the ten Federal Regional Offices from the Department of Health and Human Services.

These are the codes for the ten Federal Regional Offices from the Department of Health and Human Services.

None

Census TIGER

CensusRegionCode

The Census Region Codes and Names and Census Division Codes and Names were taken from the ORP HSA ACCESS System.

The Census Region Codes and Names and Census Division Codes and Names were taken from the ORP HSA ACCESS System.

None

Census TIGER

urbanizationizationlevel

urbanization level

urbanization level

None

AHRF

MSM_count_l

MSM counts

MSM counts

log(1 + x)

Jones et al. (2018)

Abbrevations in the Source column and their associated links.