Open Building Population Layer (United States) - beta

Open Building Population Layer (United States)
About this project
The Open Building Population Layer (United States) is a personal project by Maxim Fortin. It is not an official government data source for building footprints or population distribution. No warranty is given as to the accuracy or completeness of the information provided.

Table of Contents

1. Introduction

This dataset contains population estimates at the building level for all states in the United States (US), calculated using publicly available census tract population. This data, along with accompanying code, can be downloaded and used freely under open source licenses.

Canadian Open Building Data Layer in Los Angeles, CA

2. Why this layer was developed

Population density products have long been used to quantify demographic information and to assess relationships with hazards, ecosystems, human health and infrastructure. However, there is currently a gap for a fine-resolution population density product covering all states in the US.

While proprietary datasets presenting this type of information already exist, they can be expensive. The public availability of remote sensing and free open-source data has significantly increased in recent years, making it now easier than ever to develop good population estimates all the way down to the building level, in the public domain.

3. Example

A possible use of the Open Building Population Layer is to assess the exposure of population to flood hazard as part of risk or priority setting assessments. Flood inundation extent layers are combined with a building population layer to evaluate how many people may potentially be exposed to a particular flooding event.

The image below presents an example for a small community located along the Ottawa River in Canada, where the Open Building Population Layer was overlayed with publicly available historical flood extents (NRCan, 2017).

Flooding in Sainte-Marthe-sur-le-Lac, QC

Through this analysis, the population potentially affected by flooding can easily be estimated by calculating a sum of the population for the buildings exposed to the hazard.

4. Data sources for the layer

The Open Building Population Layer (United States) is calculated using two data sources:

  • Microsoft US Building Footprints layer: 129,591,852 computer-generated building footprints developed by Microsoft, freely available for download and use under the Open Data Commons Open Database License (ODbL).
  • 2018 US Census: population distribution at the smallest available census geographical unit, in this case census tracts with 72,837 units distributed across the US. Geospatial information combining all US states for census tracts obtained from the Center for Disease Control (CDC) ATSDR project.

5. Methodology

The building population is estimated in four stages:

  1. Extract centroid points from building footprints
  2. Calculate the number of centroid points for each census tract
  3. Clip centroid points to the state boundaries*
  4. Calculate an average population per building for each census tract using 2015-2018 census estimates
  5. Assign the average population back to each building depending on its location within the census tract distribution

*Note: Contrary to the Canada version of the Open Building Population Layer, this clip was necessary because the Microsoft building footprints in the US include a thin border of approximately 500 meters with additional buildings located outside of the state boundary, for each building footprint state file.

The image below shows an overview of what census tracts look like in urban settings, along with an indication of the building population at each point.

Close-up view in Victoria, BC

The calculation process is automated in Python. The main packages used for geospatial analysis are Geopandas and Pyogrio.

The code is available here in a Github repository.

6. Building population files (beta version)

The following files are available for download in zipped geopackage format (EPSG:5070 coordinate reference system).

StateNumber of BuildingsZipped MB
Alabama2,455,16882
Alaska111,0424
Arizona2,738,73290
Arkansas1,571,19853
California11,542,912376
Colorado2,185,95373
Connecticut1,215,62439
Delaware357,53411
District of Columbia77,8512
Florida7,263,195241
Georgia3,981,792132
Hawaii252,9088
Idaho942,13231
Illinois5,194,010170
Indiana3,379,648111
Iowa2,074,90469
Kansas1,614,40654
Kentucky2,447,68281
Louisiana2,173,56773
Maine758,99925
Maryland1,657,19956
Massachusetts2,114,60269
Michigan4,982,783164
Minnesota2,914,01697
Mississippi1,507,49651
Missouri3,190,076107
Montana773,19925
Nebraska1,187,23440
Nevada1,006,27832
New Hampshire577,93619
New Jersey2,550,30882
New Mexico1,037,09634
New York4,972,497163
North Carolina4,678,064155
North Dakota568,21319
Ohio5,544,032182
Oklahoma2,159,89473
Oregon1,873,78661
Pennsylvania4,965,213163
Rhode Island392,58112
South Carolina2,299,67176
South Dakota661,31122
Tennessee3,212,306107
Texas10,678,921364
Utah1,081,58635
Vermont351,26611
Virginia3,079,351102
Washington3,128,258102
West Virginia1,055,62534
Wisconsin3,173,347105
Wyoming386,51813

7. Limitations and potential areas of improvement

If you find errors in the dataset or have ideas for potential contributions, please don’t hesitate to contact me using the contact form accessible from the site menu. An updated version will soon be calculated combining more recent population estimates at the census block level.

The following elements have been identified as limitations and potential areas of improvement.

Population detail: The dataset provides an overview of the population density within a census tract, averaged over the number of buildings located within that census tract. The building population layer can be used for high-level assessments, but should not be used as a tool to try and directly assess the number of people living in specific building units.

Gaps in building layer: The Microsoft Bing is one of the most complete, if not the most complete, building footprint layer publicly available. However, there might still be populated areas where footprints are missing. When a gap in coverage is larger than a given census tract area, the population from that census unit cannot be assigned to any buildings. This is the main reason why the building population estimates are slightly lower than the census population data. Potential solutions include the combination of multiple building footprint data sources to fill the gaps.

Types of buildings: No differentiation is made with regards to the type of building (residential, commercial, industrial, institutional, etc.). As such, population is distributed evenly across all buildings, including non-residential buildings. Additional information related to zoning and land use could be integrated in the future to account for that aspect. Some researchers also investigated the use of additional parameters to allocate the population distribution, such as relationships between the surface area of the building footprint and the population.

Census tracts: It should be noted that there is a census unit smaller than census tracts called census blocks or block groups (a census tract is composed of block groups). However, the population for census blocks was not readily available in a single downloadable file. An updated version will soon be calculated combining more recent population estimates at the census block level.

Year of census data: The layer was calculated using the 2015-2018 census population data.

8. License and attribution

The Open Building Population Layer (United States) has been produced by Maxim Fortin as open and free data.

Dataset license

The dataset is released under the Open Database License (ODbL). This license allows you to freely use, distribute, and modify the dataset provided you attribute the source and share any modifications under the same license.

When using the dataset, please attribute as follows:
Fortin, Maxim (2022): Open Building Population Layer - United States, derived from open-source computer-generated footprints and 2018 census data, URL: https://www.maximfortin.com/project/obpl-us-2018/

No warranty is given as to the accuracy or completeness of the information provided. While the dataset provides estimates of population for each building in Canada, please note that these are derived approximations and do not represent exact population counts. Use caution when interpreting and utilizing the data for decision-making purposes.

Code license

The code is released under the Apache 2.0 License. This license allows you to freely use, distribute, and modify the code for both commercial and non-commercial purposes, with limited liability and warranty.

When using the code, please attribute as follows: Fortin, Maxim (2024): Python code for the Open Building Population Layer - United States, derived from open-source computer-generated footprints and 2018 census data, URL: https://www.maximfortin.com/project/obpl-us-2018/

No warranty is given as to the accuracy or completeness of the information provided. We provide no warranties and assume no responsibility for any liabilities associated with the use of this code.

Maxim Fortin
Maxim Fortin
Water Resources Engineer

My professional interests include hydrological and hydraulic modeling, flood mapping and geospatial data analysis.

Related