California Counties and COVID-19

There have been several newspaper articles (examples here and here and here) that speculate that population density encourages the spread of COVID-19 by making social distancing more difficult. New York City is the prime example in the U.S. I think it’s fair to say that the YIMBYs and other pro-growth urbanists have taken the position that the problem is not density but “crowding,” which is a more amorphous concept. By crowding I think they mean not enough housing and too many people per housing unit. If that’s the case, then here in California we have the county-level data to examine this question. That’s what I’ve done in the charts below.

The LA Times reports COVID-19 deaths by county and updates the information at least daily. On the evening of April 20, 2020, the online edition reported 619 COVID-19 deaths in California. From the California Dept. of Finance demographics group I found data on county population and average number of people per household. And from Wikipedia I found data on the land area of all 58 California counties. The LA Times excludes five small counties that don’t report data. For those 53 remaining counties I have created a spreadsheet that lists COVID-19 deaths, population, area, and household size, and I calculate population density and deaths per million people.

In Chart 1, I winnowed the data down to the 18 counties that have reported five or more deaths. With respect to the 53 reporting counties, the subset of 18 includes 96 percent of the deaths, 84 percent of the population, but only 44 percent of the land area. In other words, I am excluding the large, lightly populated rural and wilderness counties. A quick look at Mono County explains why. This county has the highest death rate in California, at 73.4 per million people. However, Mono County has had only one death in a population of 13,616 people. Los Angeles is the county with the second highest death rate, at 60.4 per million people. But LA Country has over 10 million people and 619 deaths. I have excluded these rural counties to make the charts less cluttered. 

Chart 1: COVID-19 deaths vs. population for 17 California counties, Los Angeles excluded.

It’s always a good idea to look at the raw data, which I have done in Chart 1 (above). Here’s a handy description: California has about 40 million people and 1,200 COVID-19 deaths. That’s a death rate of 30 per million. Of these amounts, LA County has about one-quarter of the state’s population and almost half of the COVID-19 deaths. I’ve included in the chart a blue line which indicates the average death rate of 30.69 deaths per million in the 53-county data set. I’ve had to exclude LA County because due to its size it is way off the chart. Of the remaining 17 counties that dot the chart, note that two Silicon Valley counties, Santa Clara and San Mateo, lie above the line. Riverside County, just east of LA County, is also above the line. That means their death rates are higher than average.

Orange County is the obvious outlier below the line. This is curious because the county just to the north, LA County, is a huge outlier in the opposite direction. One possible explanation for this is that COVID-19 deaths are counted where they occur in hospitals, and not where the victims lived. If Orange County residents are traveling to LA County hospitals and dying there, that would explain, to some extent, why both counties are outliers. Due to privacy concerns and lack of time during the crisis, coroners may not be reporting deaths based on where people lived. A final thing to note is that San Francisco lies below the average line.

Chart 2: COVID-19 deaths per million vs. population per square mile. San Francisco excluded.

Chart 2 compares the death rate with population density. In this chart, San Francisco County is the excluded outlier, which a population density almost 19,000 people per square mile, which puts it way off the chart to the right. That’s because San Francisco is only 47 square miles and is the only county in California that is also a city. As we’ve discussed, San Francisco’s death rate is below average (22.6 deaths per million). Again, the obvious outlier is Orange County, which is second only to San Francisco in population density, while LA is third. If you hold your hand over the Orange County dot, it’s more obvious that there is a correlation between population density and the death rate. And as before, note that LA is an outlier the the upper part of the graph.

Chart 3: COVID-19 deaths per million people vs. persons per household.

Chart 3 compares the death rate to average persons per household. It contains all 18 counties. Most of the large counties lie near the vertical straight line at 3 persons per household (the state average 2.986). Almost all of the 18 counties in the sample have average household sizes between 2.8 and 3.2. Note that, as before, LA and Orange counties are outliers, LA with a high death rate and Orange County with a low one. The counties to the right tend to be more rural, located in Central Valley and Inland Empire counties, with lower incomes and perhaps more children.

Note that Tulare County is also an outlier. The county is the home of Sequoia National Park and the agricultural city of Visalia. Its relatively high death rate may be due to the high number of cases in nursing homes in the county. Although rural, Republican and anxious to reopen, the county’s Highway 198 is a major gateway to Sequoia and Kings Canyon National Parks and sees throngs of tourists during the summer months.

The three counties to the left in the graph are an interesting group. Placer County lies along the I-80 corridor from Roseville to Lake Tahoe and includes the NW shore of the lake and the ski resorts south of Truckee. Marin and San Francisco counties are located in the Bay Area and are two of the wealthiest counties in California. Although their population densities are very different (18,806 for San Francisco, 506 for Marin), their persons per household numbers are very similar (2.350 for San Francisco, 2.441 for Marin). It’s important to note that in Marin County, only the north-south Highway 101 corridor is densely populated. Most of the rest of the county consists of dairy farms, protected agricultural land and a vast network of regional, state and national parks.

Meanwhile, in San Francisco, the proportion of children has been shrinking for years (see this). The local wisdom is that twenty-somethings meet in San Francisco, get married and move to the East Bay to raise their families. This party explains the low number of persons per household. However, cities in general have lower numbers of persons per household. Berkeley has only 2.28 persons per household, while Albany has 2.57 (and less than 10 COVID-19 cases). The main message of Chart 3 is that if important aspects of crowding are captured by household size, then crowding (unlike population density) doesn’t have much effect on the COVID-19 death rate.

Now for the caveats: County data is far from ideal. First, California counties vary wildly in size, from San Francisco at 47 square miles, to San Bernardino at 20,062 square miles, the largest county in the United States. In addition, population density varies within counties. The populated one square mile of my little town of Albany has about 20,000 people, with a density greater than that of San Francisco. Yet the county in which we are located, Alameda, has an average population of 2,262 people per square mile. Any serious analysis of the spread of COVID-19 will someday require more disaggregated geographical data, perhaps at the city, census tract, or zipcode level.

The COVID-19 pandemic is far from over. Various counties have taken different approaches along different times to sheltering in place, and that will matter, too. The snapshot in time that I’ve been describing may look very different a month from now. As the virus moves inland from more densely populated coastal regions to the more remote inland counties like Tulare, we may see a dramatic late surge in cases and deaths. While it might seem better to track cases and not deaths, testing remains much too unavailable and inconsistent. Death is a lagging indicator of the progress of the pandemic, but it is a certain one. You are either dead or you’re not, and by now we know how to tell a COVID-19 death from those caused by other medical problems.

Finally, the distinction between density and crowding is mushy. A commuter may live in a quiet suburb, but commute to San Francisco on a crowded BART train. And for some urbanists, crowding is the point–crowded bars, crowded concerts and crowded sporting events are not considered negatives. But at the aggregated county level, the COVID-19 death rate appears to correlate more closely with population density than household size.

I’m happy to send the spreadsheet that I used to create these charts to anyone who would like a copy. I’ll update this information every week or so.