Reproduction of: Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA
Original study by Kang, J. Y., A. Michels, F. Lyu, Shaohua Wang, N. Agbodo, V. L. Freeman, and Shaowen Wang. 2020. Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA. International Journal of Health Geographics 19 (1):1–17. DOI:10.1186/s12942-020-00229-x.
Reproduction Authors: Joe Holler, Derrick Burt, Kufre Udoh, with contributions from Peter Kedron, Drew An-Pham, William Procter, and the Spring 2021/Fall 2023 Open Source GIScience classs at Middlebury
Reproduction Materials Available at: github.com/HEGSRR/RPr-Kang-2020
Created: 2021-06-01
Revised: 2023-12-4
The original Kang et al.(2020) investigates access to COVID-19 healthcare resources in Chicago using a network analysis and service ratio approach. They combine and analyze data on the the point locations of hospitals, hospital capacity (specifically ICU beds and ventilators), a road network of Chicago, COVID-19 cases, and census population data on at-risk individuals (over age 50). Specifically, they generate hospital catchment areas in the vicinity of each hospital using a road network and estimated travel time to each hospital; these are effectively service polygons for each hospital, which are each one is for a specific driving distance from each hospital. Centroid points are determined for population and COVID-19 case data (for each zip code). A spatial join (joining the population point data to the each service polygon) is then used to calculate a weighted service ratio for each hospital catchment area. This data is then unioned to a "raster" grid of hexagons, to produce a continuous map of service ratios in Chicago...which is effectively an accessibility measure. This accessibility measure is weighted on a [0,1] scale, where 1 is the greatest accessibility to critical COVID-19 healthcare resources (ICU beds or ventilators). The output maps of this study compare spatial accessibility measures for COVID-19 patients to those of population at risk, and identifes which geographic areas need additional healthcare resources to improve access.
This reproduction seeks to test two amendments to the original methodology.
Improve speed limit information. The current network analysis methodology uses known speed limits on roads in the network, and employs a simple distance to hospital divided by the speed limit to calculate the travel distance to the nearest hospital, which informs the spatial extent of the service area polygons for each hospital. However, for roads that do not have a known speed limit in the data (of which there are many), the original authors arbitrarily assign a speed limit of 35mph, even though this may not be realistic (especially for highways). Similarly, some roads have a range of speed limits. Thus, we believe that there is a better methodology to computing travel time. To do this, plan to replace the network_setting function with use of the osmnx.speed module. Note how this changes the speed limit data checks before & after network_setting. To gauge the effect of making this methodological change, we plan to compare graphs of hospital catchment areas between the two methods, and also create a difference map of the hexagon grid to illustrate the changes that this new methodology has.
Improve translation from hospital catchments into hexagons by employing Area-Weighted Reaggregation. Rather than use as simple 50% overlap threshold to determine whether service polygon fragments inform the value of a hexagon, we will take a more holistic approach and better weight the role of each catchment area on a given hexagon based on its % overlap. For instance, if 3 catchment areas overlap the same hexagon, but all have different overlap amounts. To do this, focus on the overlap_calc function. Try commenting every line of this function with its purpose before changing anything. To gauge the impact of this methodological change, again, we plan to compare graphs of hospital catchment areas between the two methods, and also create a difference map of the hexagon grid to illustrate the changes that this new methodology has. We will also do a correlation analysis of these two methodologies of hexagons.
Because I change multiple aspects of the survey methodology, I will have to make two comparison. For simplicity, I will only make comparisons between maps of at-risk population (50+) and ICU beds. Although the COVID cases and ventilator data is relevant, it will make comparison between old methodology and new methodology more complex. Thus, I will keep the population_dropdown and resource_dropdown unchanged from at-risk population and ICU beds. Therefore, when I refer to "service ratio" in any discussion, I will be referring to the number of ICU beds per number of people over the age of 50.
To compare the results of improving the speed limit information: After implementing the OSMNX speed and time functions to calculate the travel times to each hospital, I will use the original 50% threshold approach to assign service ratio values to the hexagons. I will then compare the new resulting classified hexagon map (using OSMNX functions) to the classified hexagon map (that does not use the OSMNX functions) from the original study, and visually discuss the differences in spatial variation of hospital service accessibility between the two.
To compare the results of using AWR to translate the service ratios from hospital catchments to the hexagons: I will translate the service ratios to the hexagons twice, once using the old 50% threshold and another time using the new AWR approach. I will visualize the results of each one and compare. Additionally, I will difference the service ratio between these two hexagon layers to create a new difference layer. I will map this, and the shading of the hexagons will represent the difference in service ratio value for that hexagon between the AWR approach and the 50% threshold approach. NOTE: I will create the old 50% threshold service ratios and the new AWR service ratios using the new OSMNX-generated catchment areas. In other words, the comparison between the 50% threshold service ratio hexagons and the new AWR service ratio polygons will be using the improved speed limit data.
Because intuitively the OSMNX approach should only improve the accuracy of our spatial accessibility analysis, I assert that comparing the 50% threshold and AWR approach given the speed limit improvements is valid.
To perform the ESFCA method, three types of data are required, as follows: (1) road network, (2) population, and (3) hospital information. The road network can be obtained from the OpenStreetMap Python Library, called OSMNX. The population data is available on the American Community Survey. Lastly, hospital information is also publically available on the Homelanad Infrastructure Foundation-Level Data.
Import necessary libraries to run this model.
See environment.yml
for the library versions used for this analysis.
# Import modules
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import osmnx as ox
import re
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from tqdm import tqdm
import multiprocessing as mp
import folium
import itertools
import os
import time
import warnings
import IPython
import requests
from IPython.display import display, clear_output
warnings.filterwarnings("ignore")
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
numpy==1.22.0 pandas==1.3.5 geopandas==0.10.2 networkx==2.6.3 osmnx==1.1.2 re==2.2.1 folium==0.12.1.post1 IPython==8.3.0 requests==2.27.1
Because we have restructured the repository for replication, we need to check our working directory and make necessary adjustments.
# Check working directory
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020/procedure/code'
# Use to set work directory properly
if os.path.basename(os.getcwd()) == 'code':
os.chdir('../../')
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020'
If you would like to use the data generated from the pre-processing scripts, use the following code:
covid_data = gpd.read_file('./data/raw/public/Pre-Processing/covid_pre-processed.shp')
atrisk_data = gpd.read_file('./data/raw/public/Pre-Processing/atrisk_pre-processed.shp')
# Read in at risk population data
atrisk_data = gpd.read_file('./data/raw/public/PopData/Illinois_Tract.shp')
atrisk_data.head()
GEOID | STATEFP | COUNTYFP | TRACTCE | NAMELSAD | Pop | Unnamed_ 0 | NAME | OverFifty | TotalPop | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17091011700 | 17 | 091 | 011700 | Census Tract 117 | 3688 | 588 | Census Tract 117, Kankakee County, Illinois | 1135 | 3688 | POLYGON ((-87.88768 41.13594, -87.88764 41.136... |
1 | 17091011800 | 17 | 091 | 011800 | Census Tract 118 | 2623 | 220 | Census Tract 118, Kankakee County, Illinois | 950 | 2623 | POLYGON ((-87.89410 41.14388, -87.89400 41.143... |
2 | 17119400951 | 17 | 119 | 400951 | Census Tract 4009.51 | 5005 | 2285 | Census Tract 4009.51, Madison County, Illinois | 2481 | 5005 | POLYGON ((-90.11192 38.70281, -90.11128 38.703... |
3 | 17119400952 | 17 | 119 | 400952 | Census Tract 4009.52 | 3014 | 2299 | Census Tract 4009.52, Madison County, Illinois | 1221 | 3014 | POLYGON ((-90.09442 38.72031, -90.09360 38.720... |
4 | 17135957500 | 17 | 135 | 957500 | Census Tract 9575 | 2869 | 1026 | Census Tract 9575, Montgomery County, Illinois | 1171 | 2869 | POLYGON ((-89.70369 39.34803, -89.69928 39.348... |
# Read in covid case data
covid_data = gpd.read_file('./data/raw/public/PopData/Chicago_ZIPCODE.shp')
covid_data['cases'] = covid_data['cases']
covid_data.head()
ZCTA5CE10 | County | State | Join | ZONE | ZONENAME | FIPS | pop | cases | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 60660 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 43242 | 78 | POLYGON ((-87.65049 41.99735, -87.65029 41.996... |
1 | 60640 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 69715 | 117 | POLYGON ((-87.64645 41.97965, -87.64565 41.978... |
2 | 60614 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 71308 | 134 | MULTIPOLYGON (((-87.67703 41.91845, -87.67705 ... |
3 | 60712 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 12539 | 42 | MULTIPOLYGON (((-87.76181 42.00465, -87.76156 ... |
4 | 60076 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 31867 | 114 | MULTIPOLYGON (((-87.74782 42.01540, -87.74526 ... |
Note that 999 is treated as a "NULL"/"NA" so these hospitals are filtered out. This data contains the number of ICU beds and ventilators at each hospital.
# Read in hospital data
hospitals = gpd.read_file('./data/raw/public/HospitalData/Chicago_Hospital_Info.shp')
hospitals.head()
FID | Hospital | City | ZIP_Code | X | Y | Total_Bed | Adult ICU | Total Vent | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | Methodist Hospital of Chicago | Chicago | 60640 | -87.671079 | 41.972800 | 145 | 36 | 12 | MULTIPOINT (-87.67108 41.97280) |
1 | 4 | Advocate Christ Medical Center | Oak Lawn | 60453 | -87.732483 | 41.720281 | 785 | 196 | 64 | MULTIPOINT (-87.73248 41.72028) |
2 | 13 | Evanston Hospital | Evanston | 60201 | -87.683288 | 42.065393 | 354 | 89 | 29 | MULTIPOINT (-87.68329 42.06539) |
3 | 24 | AMITA Health Adventist Medical Center Hinsdale | Hinsdale | 60521 | -87.920116 | 41.805613 | 261 | 65 | 21 | MULTIPOINT (-87.92012 41.80561) |
4 | 25 | Holy Cross Hospital | Chicago | 60629 | -87.690841 | 41.770001 | 264 | 66 | 21 | MULTIPOINT (-87.69084 41.77000) |
# Plot hospital data
m = folium.Map(location=[41.85, -87.65], tiles='cartodbpositron', zoom_start=10)
for i in range(0, len(hospitals)):
folium.CircleMarker(
location=[hospitals.iloc[i]['Y'], hospitals.iloc[i]['X']],
popup="{}{}\n{}{}\n{}{}".format('Hospital Name: ',hospitals.iloc[i]['Hospital'],
'ICU Beds: ',hospitals.iloc[i]['Adult ICU'],
'Ventilators: ', hospitals.iloc[i]['Total Vent']),
radius=5,
color='blue',
fill=True,
fill_opacity=0.6,
legend_name = 'Hospitals'
).add_to(m)
legend_html = '''<div style="position: fixed; width: 20%; heigh: auto;
bottom: 10px; left: 10px;
solid grey; z-index:9999; font-size:14px;
"> Legend<br>'''
m
# Read in and plot grid file for Chicago
grid_file = gpd.read_file('./data/raw/public/GridFile/Chicago_Grid.shp')
grid_file.plot(figsize=(8,8))
<AxesSubplot:>
If Chicago_Network_Buffer.graphml
does not already exist, this cell will query the road network from OpenStreetMap.
Each of the road network code blocks may take a few mintues to run.
%%time
# To create a new graph from OpenStreetMap, delete or rename data/raw/private/Chicago_Network_Buffer.graphml
# (if it exists), and set OSM to True
OSM = True
# if buffered street network is not saved, and OSM is preferred, # generate a new graph from OpenStreetMap and save it
if not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml") and OSM:
print("Loading buffered Chicago road network from OpenStreetMap. Please wait... runtime may exceed 9min...", flush=True)
G = ox.graph_from_place('Chicago', network_type='drive', buffer_dist=24140.2)
print("Saving Chicago road network to raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
ox.save_graphml(G, './data/raw/private/Chicago_Network_Buffer.graphml')
print("Data saved.")
# otherwise, if buffered street network is not saved, download graph from the OSF project
elif not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Downloading buffered Chicago road network from OSF...", flush=True)
url = 'https://osf.io/download/z8ery/'
r = requests.get(url, allow_redirects=True)
print("Saving buffered Chicago road network to file...", flush=True)
open('./data/raw/private/Chicago_Network_Buffer.graphml', 'wb').write(r.content)
# if the buffered street network is already saved, load it
if os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
G = ox.load_graphml('./data/raw/private/Chicago_Network_Buffer.graphml')
print("Data loaded.")
else:
print("Error: could not load the road network from file.")
Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait... Data loaded. CPU times: user 55.1 s, sys: 3.83 s, total: 58.9 s Wall time: 58.7 s
%%time
ox.plot_graph(G, node_size = 1, bgcolor = 'white', node_color = 'black', edge_color = "#333333", node_alpha = 0.5, edge_linewidth = 0.5)
CPU times: user 1min 8s, sys: 646 ms, total: 1min 9s Wall time: 1min 9s
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)
Display all the unique speed limit values and count how many network edges (road segments) have each value. We will compare this to our cleaned network later.
%%time
# Turn nodes and edges into geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
# Get unique counts of road segments for each speed limit
print(edges['maxspeed'].value_counts())
print(str(len(edges)) + " edges in graph")
# can we also visualize highways / roads with higher speed limits to check accuracy?
# the code above converts the graph into an edges geodataframe, which could theoretically be filtered
# by fast road segments and mapped, e.g. in folium
nodes
25 mph 4793 30 mph 3555 35 mph 3364 40 mph 2093 45 mph 1418 20 mph 1155 55 mph 614 60 mph 279 50 mph 191 40 79 15 mph 76 70 mph 71 65 mph 54 10 mph 38 [40 mph, 45 mph] 27 [30 mph, 35 mph] 26 45,30 24 [40 mph, 35 mph] 22 70 21 25 20 [55 mph, 45 mph] 16 25, east 14 [45 mph, 35 mph] 13 [30 mph, 25 mph] 10 [45 mph, 50 mph] 8 50 8 [40 mph, 30 mph] 7 [35 mph, 25 mph] 6 [55 mph, 60 mph] 5 20 4 [70 mph, 60 mph] 3 [65 mph, 60 mph] 3 [40 mph, 45 mph, 35 mph] 3 [70 mph, 65 mph] 2 [70 mph, 45 mph, 5 mph] 2 [40, 45 mph] 2 [35 mph, 50 mph] 2 35 2 [55 mph, 65 mph] 2 [40 mph, 50 mph] 2 [15 mph, 25 mph] 2 [40 mph, 35 mph, 25 mph] 2 [15 mph, 40 mph, 30 mph] 2 [20 mph, 25 mph] 2 [30 mph, 25, east] 2 [65 mph, 55 mph] 2 [20 mph, 35 mph] 2 [55 mph, 55] 2 55 2 [15 mph, 30 mph] 2 [45 mph, 30 mph] 2 [15 mph, 45 mph] 2 [55 mph, 45, east, 50 mph] 2 [20 mph, 30 mph] 1 [5 mph, 45 mph, 35 mph] 1 [55 mph, 35 mph] 1 [5 mph, 35 mph] 1 [55 mph, 50 mph] 1 Name: maxspeed, dtype: int64 384240 edges in graph CPU times: user 39.4 s, sys: 154 ms, total: 39.6 s Wall time: 39.3 s
y | x | osmid | highway | ref | geometry | |
---|---|---|---|---|---|---|
osmid | ||||||
261095436 | 42.105710 | -87.902373 | 261095436 | NaN | NaN | POINT (-87.90237 42.10571) |
261095437 | 42.105404 | -87.901983 | 261095437 | NaN | NaN | POINT (-87.90198 42.10540) |
261095439 | 42.105109 | -87.901586 | 261095439 | NaN | NaN | POINT (-87.90159 42.10511) |
261095445 | 42.104685 | -87.902326 | 261095445 | turning_circle | NaN | POINT (-87.90233 42.10468) |
261095446 | 42.104630 | -87.901830 | 261095446 | NaN | NaN | POINT (-87.90183 42.10463) |
... | ... | ... | ... | ... | ... | ... |
4332716015 | 42.026807 | -87.806467 | 4332716015 | NaN | NaN | POINT (-87.80647 42.02681) |
2293235696 | 41.794604 | -87.627795 | 2293235696 | NaN | NaN | POINT (-87.62780 41.79460) |
237502450 | 41.466574 | -87.667652 | 237502450 | NaN | NaN | POINT (-87.66765 41.46657) |
261095421 | 41.644896 | -87.728298 | 261095421 | NaN | NaN | POINT (-87.72830 41.64490) |
261095422 | 41.644933 | -87.726149 | 261095422 | NaN | NaN | POINT (-87.72615 41.64493) |
142318 rows × 6 columns
edges.head()
osmid | highway | oneway | length | name | geometry | lanes | ref | bridge | maxspeed | access | service | tunnel | junction | width | area | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
u | v | key | ||||||||||||||||
261095436 | 261095437 | 0 | 24067717 | residential | False | 46.873 | NaN | LINESTRING (-87.90237 42.10571, -87.90198 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
261095437 | 261095439 | 0 | 24067717 | residential | False | 46.317 | NaN | LINESTRING (-87.90198 42.10540, -87.90159 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
261095436 | 0 | 24067717 | residential | False | 46.873 | NaN | LINESTRING (-87.90198 42.10540, -87.90237 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
261109275 | 0 | 24069424 | residential | False | 34.892 | NaN | LINESTRING (-87.90198 42.10540, -87.90227 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
261109274 | 0 | 24069424 | residential | False | 47.866 | NaN | LINESTRING (-87.90198 42.10540, -87.90156 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Cleans the OSMNX network to work better with drive-time analysis.
First, we remove all nodes with 0 outdegree because any hospital assigned to such a node would be unreachable from everywhere. Next, we remove small (under 10 node) strongly connected components to reduce erroneously small ego-centric networks. Lastly, we ensure that the max speed is set and in the correct units before calculating time.
Args:
Returns:
#view all highway types
print(edges['highway'].value_counts())
residential 296481 secondary 30909 tertiary 29216 primary 19277 motorway_link 2322 unclassified 1840 motorway 1449 trunk 843 primary_link 833 secondary_link 356 living_street 238 trunk_link 157 tertiary_link 121 [residential, unclassified] 69 [tertiary, residential] 66 [secondary, primary] 15 [secondary, tertiary] 10 [motorway, motorway_link] 6 [tertiary, unclassified] 6 [motorway, trunk] 4 [residential, living_street] 4 [secondary, secondary_link] 3 busway 2 [motorway, primary] 2 [tertiary, motorway_link] 2 emergency_bay 2 [trunk, primary] 2 [tertiary, tertiary_link] 1 [trunk, motorway] 1 [primary, motorway_link] 1 [secondary, motorway_link] 1 [primary_link, residential] 1 Name: highway, dtype: int64
# two things about this function:
# 1) the work to remove nodes is hardly worth it now that OSMnx cleans graphs by default
# the function is now only pruning < 300 nodes
# 2) try using the OSMnx speed module for setting speeds, travel times
# https://osmnx.readthedocs.io/en/stable/user-reference.html#module-osmnx.speed
# just be careful about units of speed and time!
# the remainder of this code expects 'time' to be measured in minutes
# def network_setting(network):
# _nodes_removed = len([n for (n, deg) in network.out_degree() if deg ==0])
# network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg ==0])
# for component in list(nx.strongly_connected_components(network)):
# if len(component)<10:
# for node in component:
# _nodes_removed+=1
# network.remove_node(node)
# for u, v, k, data in tqdm(G.edges(data=True, keys=True),position=0):
# if 'maxspeed' in data.keys():
# speed_type = type(data['maxspeed'])
# if (speed_type==str):
# # Add in try/except blocks to catch maxspeed formats that don't fit Kang et al's cases
# try:
# if len(data['maxspeed'].split(','))==2:
# data['maxspeed_fix']=float(data['maxspeed'].split(',')[0])
# elif data['maxspeed']=='signals':
# data['maxspeed_fix']=30.0 # drive speed setting as 35 miles
# else:
# data['maxspeed_fix']=float(data['maxspeed'].split()[0])
# except:
# data['maxspeed_fix']=30.0 #miles
# else:
# try:
# data['maxspeed_fix']=float(data['maxspeed'][0].split()[0])
# except:
# data['maxspeed_fix']=30.0 #miles
# else:
# data['maxspeed_fix']=30.0 #miles
# data['maxspeed_meters'] = data['maxspeed_fix']*26.8223 # convert mile per hour to meters per minute
# data['time'] = float(data['length'])/ data['maxspeed_meters'] # meters / meters per minute = minutes
# print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed/float(network.number_of_nodes())))
# print("Number of nodes: {}".format(network.number_of_nodes()))
# print("Number of edges: {}".format(network.number_of_edges()))
# return(network)
# Imputes edge speed in kilometers per hour to roads that are missing speed limits
# _nodes_removed = len([n for (n, deg) in network.out_degree() if deg ==0])
# network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg ==0])
# for component in list(nx.strongly_connected_components(network)):
# if len(component)<10:
# for node in component:
# _nodes_removed+=1
# network.remove_node(node)
# print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed/float(network.number_of_nodes())))
# print("Number of nodes: {}".format(network.number_of_nodes()))
# print("Number of edges: {}".format(network.number_of_edges()))
# return(network)
def network_setting(network):
_nodes_removed = len([n for (n, deg) in network.out_degree() if deg ==0])
network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg ==0])
for component in list(nx.strongly_connected_components(network)):
if len(component)<10:
for node in component:
_nodes_removed+=1
network.remove_node(node)
ox.speed.add_edge_speeds(network)
ox.speed.add_edge_travel_times(network)
print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed/float(network.number_of_nodes())))
print("Number of nodes: {}".format(network.number_of_nodes()))
print("Number of edges: {}".format(network.number_of_edges()))
return network
print(G)
MultiDiGraph named 'Chicago' with 142044 nodes and 383911 edges
%%time
# G, hospitals, grid_file, pop_data = file_import (population_dropdown.value)
G = network_setting(G)
# Create point geometries for each node in the graph, to make constructing catchment area polygons easier
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
# Modify code to react to processor dropdown (got rid of file_import function)
Removed 274 nodes (0.0019%) from the OSMNX network Number of nodes: 142044 Number of edges: 383911 CPU times: user 52.4 s, sys: 837 ms, total: 53.3 s Wall time: 53 s
Display all the unique speed limit values and count how many network edges (road segments) have each value. Compare to the previous results.
%%time
## Get unique counts of speed limits for each road network...after imputing
# Turn nodes and edges in geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
# Convert from kph to mph
#edges['speed_mph'] = edges['speed_kph']*0.62137119223733
# Count
#print(edges['speed_mph'].value_counts())
print(str(len(edges)) + " edges in graph")
print(edges['travel_time'].value_counts()) #in seconds
383911 edges in graph 9.3 14185 9.2 11922 18.6 8012 9.4 7209 18.5 6608 ... 199.5 1 115.5 1 145.7 1 122.3 1 136.9 1 Name: travel_time, Length: 1183, dtype: int64 CPU times: user 48.2 s, sys: 251 ms, total: 48.5 s Wall time: 48.3 s
print(edges['speed_kph'].value_counts())
39.2000000000 291413 48.3000000000 29822 56.7000000000 26353 60.1000000000 14985 40.2000000000 5604 56.3000000000 3364 86.3000000000 2200 64.4000000000 2093 32.2000000000 1872 42.9000000000 1793 72.4000000000 1418 69.8000000000 654 88.5000000000 606 90.1000000000 565 96.6000000000 277 80.5000000000 191 51.0000000000 118 40.0000000000 80 24.1000000000 76 112.7000000000 61 104.6000000000 42 16.1000000000 38 25.0000000000 34 68.0000000000 29 52.0000000000 26 45.3000000000 24 60.0000000000 24 70.0000000000 21 64.0000000000 18 80.0000000000 16 44.0000000000 12 56.0000000000 9 76.0000000000 8 50.0000000000 8 48.0000000000 8 36.0000000000 6 92.0000000000 5 96.0000000000 4 71.0000000000 4 20.0000000000 4 104.0000000000 3 32.0000000000 3 72.0000000000 3 45.0000000000 3 100.0000000000 3 52.4000000000 2 55.0000000000 2 108.0000000000 2 35.0000000000 2 53.0000000000 2 84.0000000000 1 Name: speed_kph, dtype: int64
def hospital_setting(hospitals, G):
# Create an empty column
hospitals['nearest_osm']=None
# Append the neaerest osm column with each hospitals neaerest osm node
for i in tqdm(hospitals.index, desc="Find the nearest network node from hospitals", position=0):
hospitals['nearest_osm'][i] = ox.get_nearest_node(G, [hospitals['Y'][i], hospitals['X'][i]], method='euclidean') # find the nearest node from hospital location
print ('hospital setting is done')
return(hospitals)
Converts geodata to centroids
Args:
Returns:
def pop_centroid (pop_data, pop_type):
pop_data = pop_data.to_crs({'init': 'epsg:4326'})
# If pop is selected in dropdown, select at risk pop where population is greater than 0
if pop_type =="pop":
pop_data=pop_data[pop_data['OverFifty']>=0]
# If covid is selected in dropdown, select where covid cases are greater than 0
if pop_type =="covid":
pop_data=pop_data[pop_data['cases']>=0]
pop_cent = pop_data.centroid # it make the polygon to the point without any other information
# Convert to gdf
pop_centroid = gpd.GeoDataFrame()
i = 0
for point in tqdm(pop_cent, desc='Pop Centroid File Setting', position=0):
if pop_type== "pop":
pop = pop_data.iloc[i]['OverFifty']
code = pop_data.iloc[i]['GEOID']
if pop_type =="covid":
pop = pop_data.iloc[i]['cases']
code = pop_data.iloc[i].ZCTA5CE10
pop_centroid = pop_centroid.append({'code':code,'pop': pop,'geometry': point}, ignore_index=True)
i = i+1
return(pop_centroid)
Function written by Joe Holler + Derrick Burt. It is a more efficient way to calculate distance-weighted catchment areas for each hospital. The algorithm runs quicker than the original one ("calculate_catchment_area"). It first creates a dictionary (with a node and its corresponding drive time from the hospital) of all nodes within a 30 minute drive time (using single_cource_dijkstra_path_length function). From here, two more dictionaries are constructed by querying the original one. From this dictionaries, single part convex hulls are created for each drive time interval and appended into a single list (one list with 3 polygon geometries). Within the list, the polygons are differenced from each other to produce three catchment areas.
Args:
Returns:
def dijkstra_cca_polygons(G, nearest_osm, distances, distance_unit = "travel_time"):
'''
Before running: must assign point geometries to street nodes
# create point geometries for the entire graph
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
'''
## CREATE DICTIONARIES
# create dictionary of nearest nodes
nearest_nodes_30 = nx.single_source_dijkstra_path_length(G, nearest_osm, distances[2], distance_unit) # creating the largest graph from which 10 and 20 minute drive times can be extracted from
# extract values within 20 and 10 (respectively) minutes drive times
nearest_nodes_20 = dict()
nearest_nodes_10 = dict()
for key, value in nearest_nodes_30.items():
if value <= distances[1]:
nearest_nodes_20[key] = value
if value <= distances[0]:
nearest_nodes_10[key] = value
## CREATE POLYGONS FOR 3 DISTANCE CATEGORIES (10 min, 20 min, 30 min)
# 30 MIN
# If the graph already has a geometry attribute with point data,
# this line will create a GeoPandas GeoDataFrame from the nearest_nodes_30 dictionary
points_30 = gpd.GeoDataFrame(gpd.GeoSeries(nx.get_node_attributes(G.subgraph(nearest_nodes_30), 'geometry')))
# This line converts the nearest_nodes_30 dictionary into a Pandas data frame and joins it to points
# left_index=True and right_index=True are options for merge() to join on the index values
points_30 = points_30.merge(pd.Series(nearest_nodes_30).to_frame(), left_index=True, right_index=True)
# Re-name the columns and set the geodataframe geometry to the geometry column
points_30 = points_30.rename(columns={'0_x':'geometry','0_y':'z'}).set_geometry('geometry')
# Create a convex hull polygon from the points
polygon_30 = gpd.GeoDataFrame(gpd.GeoSeries(points_30.unary_union.convex_hull))
polygon_30 = polygon_30.rename(columns={0:'geometry'}).set_geometry('geometry')
# 20 MIN
# Select nodes less than or equal to 20 mins (1200 secs)
points_20 = points_30.query("z <= 1200")
# Create a convex hull polygon from the points
polygon_20 = gpd.GeoDataFrame(gpd.GeoSeries(points_20.unary_union.convex_hull))
polygon_20 = polygon_20.rename(columns={0:'geometry'}).set_geometry('geometry')
# 10 MIN
# Select nodes less than or equal to 10 mins (600 secs)
points_10 = points_30.query("z <= 600")
# Create a convex hull polygon from the points
polygon_10 = gpd.GeoDataFrame(gpd.GeoSeries(points_10.unary_union.convex_hull))
polygon_10 = polygon_10.rename(columns={0:'geometry'}).set_geometry('geometry')
# Create empty list and append polygons
polygons = []
# Append
polygons.append(polygon_10)
polygons.append(polygon_20)
polygons.append(polygon_30)
# Clip the overlapping distance ploygons (create two donuts + hole)
for i in reversed(range(1, len(distances))):
polygons[i] = gpd.overlay(polygons[i], polygons[i-1], how="difference")
return polygons
Measures the effect of a single hospital on the surrounding area. (Uses dijkstra_cca_polygons
)
Args:
Returns:
def hospital_measure_acc (_thread_id, hospital, pop_data, distances, weights):
# Create polygons
polygons = dijkstra_cca_polygons(G, hospital['nearest_osm'], distances)
# Calculate accessibility measurements
num_pops = []
for j in pop_data.index:
point = pop_data['geometry'][j]
# Multiply polygons by weights
for k in range(len(polygons)):
if len(polygons[k]) > 0: # To exclude the weirdo (convex hull is not polygon)
if (point.within(polygons[k].iloc[0]["geometry"])):
num_pops.append(pop_data['pop'][j]*weights[k])
total_pop = sum(num_pops)
for i in range(len(distances)):
polygons[i]['time']=distances[i]
polygons[i]['total_pop']=total_pop
polygons[i]['hospital_icu_beds'] = float(hospital['Adult ICU'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i]['hospital_vents'] = float(hospital['Total Vent'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i].crs = { 'init' : 'epsg:4326'}
polygons[i] = polygons[i].to_crs({'init':'epsg:32616'})
print('{:.0f}'.format(_thread_id), end=" ", flush=True)
return(_thread_id, [ polygon.copy(deep=True) for polygon in polygons ])
Parallel implementation of accessibility measurement.
Args:
Returns:
def hospital_acc_unpacker(args):
return hospital_measure_acc(*args)
# WHERE THE RESULTS ARE POOLED AND THEN REAGGREGATED
def measure_acc_par (hospitals, pop_data, network, distances, weights, num_proc = 4):
catchments = []
for distance in distances:
catchments.append(gpd.GeoDataFrame())
pool = mp.Pool(processes = num_proc)
hospital_list = [ hospitals.iloc[i] for i in range(len(hospitals)) ]
print("Calculating", len(hospital_list), "hospital catchments...\ncompleted number:", end=" ")
results = pool.map(hospital_acc_unpacker, zip(range(len(hospital_list)), hospital_list, itertools.repeat(pop_data), itertools.repeat(distances), itertools.repeat(weights)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
for i in range(len(results)):
for j in range(len(distances)):
catchments[j] = catchments[j].append(results[i][j], sort=False)
return catchments
Calculates and aggregates accessibility statistics for one catchment on our grid file.
Args:
Returns:
from collections import Counter
def overlap_calc(_id, poly, grid_file, weight, service_type):
value_dict = Counter()
if type(poly.iloc[0][service_type])!=type(None):
value = float(poly[service_type])*weight
intersect = gpd.overlay(grid_file, poly, how='intersection')
intersect['overlapped']= intersect.area
intersect['percent'] = intersect['overlapped']/intersect['area']
intersect=intersect[intersect['percent']>=0.5]
intersect_region = intersect['id']
for intersect_id in intersect_region:
try:
value_dict[intersect_id] +=value
except:
value_dict[intersect_id] = value
return(_id, value_dict)
def overlap_calc_unpacker(args):
return overlap_calc(*args)
Calculates how all catchment areas overlap with and affect the accessibility of each grid in our grid file.
Args:
Returns:
def overlapping_function (grid_file, catchments, service_type, weights, num_proc = 4):
grid_file[service_type]=0
pool = mp.Pool(processes = num_proc)
acc_list = []
for i in range(len(catchments)):
acc_list.extend([ catchments[i][j:j+1] for j in range(len(catchments[i])) ])
acc_weights = []
for i in range(len(catchments)):
acc_weights.extend( [weights[i]]*len(catchments[i]) )
results = pool.map(overlap_calc_unpacker, zip(range(len(acc_list)), acc_list, itertools.repeat(grid_file), acc_weights, itertools.repeat(service_type)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
service_values = results[0]
for result in results[1:]:
service_values+=result
for intersect_id, value in service_values.items():
grid_file.loc[grid_file['id']==intersect_id, service_type] += value
return(grid_file)
Normalizes our result (Geodataframe) for a given resource (res).
def normalization (result, res):
result[res]=(result[res]-min(result[res]))/(max(result[res])-min(result[res]))
return result
Imports all files we need to run our code and pulls the Illinois network from OSMNX if it is not present (will take a while).
NOTE: even if we calculate accessibility for just Chicago, we want to use the Illinois network (or at least we should not use the Chicago network) because using the Chicago network will result in hospitals near but outside of Chicago having an infinite distance (unreachable because roads do not extend past Chicago).
Args:
Returns:
def output_map(output_grid, base_map, hospitals, resource):
ax=output_grid.plot(column=resource, cmap='PuBuGn',figsize=(18,12), legend=True, zorder=1)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([314000, 370000])
ax.set_ylim([540000, 616000])
base_map.plot(ax=ax, facecolor="none", edgecolor='gray', lw=0.1)
hospitals.plot(ax=ax, markersize=10, zorder=1, c='blue')
Below you can customize the input of the model:
import ipywidgets
from IPython.display import display
processor_dropdown = ipywidgets.Dropdown( options=[("1", 1), ("2", 2), ("3", 3), ("4", 4)],
value = 4, description = "Processor: ")
population_dropdown = ipywidgets.Dropdown( options=[("Population at Risk", "pop"), ("COVID-19 Patients", "covid") ],
value = "pop", description = "Population: ")
resource_dropdown = ipywidgets.Dropdown( options=[("ICU Beds", "hospital_icu_beds"), ("Ventilators", "hospital_vents") ],
value = "hospital_icu_beds", description = "Resource: ")
hospital_dropdown = ipywidgets.Dropdown( options=[("All hospitals", "hospitals"), ("Subset", "hospital_subset") ],
value = "hospitals", description = "Hospital:")
display(processor_dropdown,population_dropdown,resource_dropdown,hospital_dropdown)
Dropdown(description='Processor: ', index=3, options=(('1', 1), ('2', 2), ('3', 3), ('4', 4)), value=4)
Dropdown(description='Population: ', options=(('Population at Risk', 'pop'), ('COVID-19 Patients', 'covid')), …
Dropdown(description='Resource: ', options=(('ICU Beds', 'hospital_icu_beds'), ('Ventilators', 'hospital_vents…
Dropdown(description='Hospital:', options=(('All hospitals', 'hospitals'), ('Subset', 'hospital_subset')), val…
if population_dropdown.value == "pop":
pop_data = pop_centroid(atrisk_data, population_dropdown.value)
elif population_dropdown.value == "covid":
pop_data = pop_centroid(covid_data, population_dropdown.value)
distances=[600, 1200, 1800] # Distances in travel time (these are in seconds, so 10min, 20min, 30min)
weights=[1.0, 0.68, 0.22] # Weights where weights[0] is applied to distances[0]
# Other weighting options representing different distance decays
# weights1, weights2, weights3 = [1.0, 0.42, 0.09], [1.0, 0.75, 0.5], [1.0, 0.5, 0.1]
# it is surprising how long this function takes just to calculate centroids.
# why not do it with the geopandas/pandas functions rather than iterating through every item?
Pop Centroid File Setting: 100%|██████████| 3121/3121 [05:23<00:00, 9.64it/s]
If you have already run this code and changed the Hospital selection, rerun the Load Hospital Data block.
# Set hospitals according to hospital dropdown
if hospital_dropdown.value == "hospital_subset":
hospitals = hospital_setting(hospitals[:1], G)
else:
hospitals = hospital_setting(hospitals, G)
resources = ["hospital_icu_beds", "hospital_vents"] # resources
# this is also slower than it needs to be; if network nodes and hospitals are both
# geopandas data frames, it should be possible to do a much faster spatial join rather than iterating through every hospital
Find the nearest network node from hospitals: 100%|██████████| 66/66 [01:52<00:00, 1.71s/it]
hospital setting is done
# Create point geometries for entire graph
# what is the pupose of the following two lines? Can this be deleted?
# for node, data in G.nodes(data=True):
# data['geometry']=Point(data['x'], data['y'])
# which hospital to visualize?
fighosp = 0
# Create catchment for hospital 0
poly = dijkstra_cca_polygons(G, hospitals['nearest_osm'][fighosp], distances)
# Reproject polygons
for i in range(len(poly)):
poly[i].crs = { 'init' : 'epsg:4326'}
poly[i] = poly[i].to_crs({'init':'epsg:32616'})
# Reproject hospitals
# Possible to map from the hospitals data rather than creating hospital_subset?
hospital_subset = hospitals.iloc[[fighosp]].to_crs(epsg=32616)
fig, ax = plt.subplots(figsize=(12,8))
min_10 = poly[0].plot(ax=ax, color="royalblue", label="10 min drive")
min_20 = poly[1].plot(ax=ax, color="cornflowerblue", label="20 min drive")
min_30 = poly[2].plot(ax=ax, color="lightsteelblue", label="30 min drive")
hospital_subset.plot(ax=ax, color="red", legend=True, label = "hospital")
# Add legend
ax.legend()
<matplotlib.legend.Legend at 0x7f19d234f730>
poly
[ geometry 0 POLYGON ((443456.283 4609874.589, 441585.172 4..., geometry 0 POLYGON ((433443.581 4600237.316, 427780.923 4..., geometry 0 POLYGON ((438932.445 4588484.312, 431706.358 4...]
%%time
catchments = measure_acc_par(hospitals, pop_data, G, distances, weights, num_proc=processor_dropdown.value)
Calculating 66 hospital catchments... completed number: 0 5 15 10 1 6 16 11 2 7 17 12 8 3 13 18 4 9 19 14 20 25 35 30 26 21 36 31 27 22 37 32 28 23 38 33 29 24 39 40 34 45 50 41 55 46 51 42 56 47 52 43 57 48 58 53 44 49 59 54 60 65 61 62 63 64 CPU times: user 4.05 s, sys: 1.33 s, total: 5.38 s Wall time: 3min 35s
%%time
for j in range(len(catchments)):
catchments[j] = catchments[j][catchments[j][resource_dropdown.value]!=float('inf')]
result=overlapping_function(grid_file, catchments, resource_dropdown.value, weights, num_proc=processor_dropdown.value)
result
CPU times: user 8.45 s, sys: 1.04 s, total: 9.49 s Wall time: 32.5 s
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.4160873027 | 4638515.4028352341 | 441420.7663564924 | 4638015.4028352341 | 4158 | 216506.3509461179 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.0035463701 |
1 | 440843.4160873027 | 4638015.4028352341 | 441420.7663564924 | 4637515.4028352341 | 4159 | 216506.3509461179 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.0036375099 |
2 | 440843.4160873027 | 4639515.4028352341 | 441420.7663564924 | 4639015.4028352341 | 4156 | 216506.3509461179 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.0036381016 |
3 | 440843.4160873027 | 4639015.4028352341 | 441420.7663564924 | 4638515.4028352341 | 4157 | 216506.3509461179 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.0035738208 |
4 | 440843.4160873027 | 4640515.4028352341 | 441420.7663564924 | 4640015.4028352341 | 4154 | 216506.3509461179 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.0037206598 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
3274 | 440843.4160873027 | 4643015.4028352341 | 441420.7663564924 | 4642515.4028352341 | 4149 | 216506.3509461179 | POLYGON ((440843.416 4642765.403, 440987.754 4... | 0.0035694996 |
3275 | 440843.4160873027 | 4644515.4028352341 | 441420.7663564924 | 4644015.4028352341 | 4146 | 216506.3509461179 | POLYGON ((440843.416 4644265.403, 440987.754 4... | 0.0034804217 |
3276 | 440843.4160873027 | 4644015.4028352341 | 441420.7663564924 | 4643515.4028352341 | 4147 | 216506.3509461179 | POLYGON ((440843.416 4643765.403, 440987.754 4... | 0.0034804217 |
3277 | 440843.4160873027 | 4645515.4028352341 | 441420.7663564924 | 4645015.4028352341 | 4144 | 216506.3509461179 | POLYGON ((440843.416 4645265.403, 440987.754 4... | 0.0034293724 |
3278 | 440843.4160873027 | 4645015.4028352341 | 441420.7663564924 | 4644515.4028352341 | 4145 | 216506.3509461179 | POLYGON ((440843.416 4644765.403, 440987.754 4... | 0.0034410113 |
3279 rows × 8 columns
%%time
result = normalization (result, resource_dropdown.value)
CPU times: user 10.5 ms, sys: 4.4 ms, total: 14.9 ms Wall time: 11.7 ms
result.head()
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.4160873027 | 4638515.4028352341 | 441420.7663564924 | 4638015.4028352341 | 4158 | 216506.3509461179 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.9032740792 |
1 | 440843.4160873027 | 4638015.4028352341 | 441420.7663564924 | 4637515.4028352341 | 4159 | 216506.3509461179 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.9296141212 |
2 | 440843.4160873027 | 4639515.4028352341 | 441420.7663564924 | 4639015.4028352341 | 4156 | 216506.3509461179 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.9297851328 |
3 | 440843.4160873027 | 4639015.4028352341 | 441420.7663564924 | 4638515.4028352341 | 4157 | 216506.3509461179 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.9112075113 |
4 | 440843.4160873027 | 4640515.4028352341 | 441420.7663564924 | 4640015.4028352341 | 4154 | 216506.3509461179 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.9536450358 |
# add weight field to each catchment polygon
for i in range(len(weights)):
catchments[i]['weight'] = weights[i]
# combine the three sets of catchment polygons into one geodataframe
geocatchments = pd.concat([catchments[0], catchments[1], catchments[2]])
geocatchments
geometry | time | total_pop | hospital_icu_beds | hospital_vents | weight | |
---|---|---|---|---|---|---|
0 | POLYGON ((446359.955 4637144.048, 444654.345 4... | 600 | 789023.7400000009 | 0.0000456260 | 0.0000152087 | 1.0000000000 |
0 | POLYGON ((438353.601 4609853.779, 432065.727 4... | 600 | 717916.3799999998 | 0.0002730123 | 0.0000891469 | 1.0000000000 |
0 | POLYGON ((442878.135 4648745.067, 441056.875 4... | 600 | 469346.5200000000 | 0.0001896254 | 0.0000617880 | 1.0000000000 |
0 | POLYGON ((423900.989 4621140.151, 421031.920 4... | 600 | 732753.5999999985 | 0.0000887065 | 0.0000286590 | 1.0000000000 |
0 | POLYGON ((443322.063 4615428.578, 438387.446 4... | 600 | 716375.1200000002 | 0.0000921305 | 0.0000293143 | 1.0000000000 |
... | ... | ... | ... | ... | ... | ... |
0 | POLYGON ((440431.675 4605335.203, 415910.447 4... | 1800 | 1015836.6400000000 | 0.0000265791 | 0.0000088597 | 0.2200000000 |
0 | MULTIPOLYGON (((418680.569 4620247.323, 411754... | 1800 | 754080.5999999993 | 0.0000596753 | 0.0000185657 | 0.2200000000 |
0 | POLYGON ((421589.871 4617483.974, 415910.447 4... | 1800 | 973822.9600000000 | 0.0000862580 | 0.0000277258 | 0.2200000000 |
0 | POLYGON ((438852.888 4603453.515, 415910.447 4... | 1800 | 936876.3000000000 | 0.0000651100 | 0.0000213475 | 0.2200000000 |
0 | POLYGON ((416049.771 4606193.128, 411127.822 4... | 1800 | 815589.7800000018 | 0.0001152540 | 0.0000367832 | 0.2200000000 |
198 rows × 6 columns
%%time
# set weighted to False for original 50% threshold method
# switch to True for area-weighted overlay
weighted = True
# if the value to be calculated is already in the hegaxon grid, delete it
# otherwise, the field name gets a suffix _1 in the overlay step
if resource_dropdown.value in list(grid_file.columns.values):
grid_file = grid_file.drop(resource_dropdown.value, axis = 1)
# calculate hexagon 'target' areas
grid_file['area'] = grid_file.area
# Intersection overlay of hospital catchments and hexagon grid
print("Intersecting hospital catchments with hexagon grid...")
fragments = gpd.overlay(grid_file, geocatchments, how='intersection')
# Calculate percent coverage of the hexagon by the hospital catchment as
# fragment area / target(hexagon) area
fragments['percent'] = fragments.area / fragments['area']
# if using weighted aggregation...
if weighted:
print("Calculating area-weighted value...")
# multiply the service/population ratio by the distance weight and the percent coverage
fragments['value'] = fragments[resource_dropdown.value] * fragments['weight'] * fragments['percent']
# if using the 50% coverage rule for unweighted aggregation...
else:
print("Calculating value for hexagons with >=50% overlap...")
# filter for only the fragments with > 50% coverage by hospital catchment
fragments = fragments[fragments['percent']>=0.5]
# multiply the service/population ration by the distance weight
fragments['value'] = fragments[resource_dropdown.value] * fragments['weight']
# select just the hexagon id and value from the fragments,
# group the fragments by the (hexagon) id,
# and sum the values
print("Summarizing results by hexagon id...")
sum_results = fragments[['id', 'value']].groupby(by = ['id']).sum()
# join the results to the hexagon grid_file based on hexagon id
print("Joining results to hexagons...")
result_new = pd.merge(grid_file, sum_results, how="left", on = "id")
# rename value column name to the resource name
result_new = result_new.rename(columns = {'value' : resource_dropdown.value})
result_new
Intersecting hospital catchments with hexagon grid... Calculating area-weighted value... Summarizing results by hexagon id... Joining results to hexagons... CPU times: user 17.9 s, sys: 165 ms, total: 18 s Wall time: 18 s
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.4160873027 | 4638515.4028352341 | 441420.7663564924 | 4638015.4028352341 | 4158 | 216506.3509461179 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.0035635364 |
1 | 440843.4160873027 | 4638015.4028352341 | 441420.7663564924 | 4637515.4028352341 | 4159 | 216506.3509461179 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.0036172187 |
2 | 440843.4160873027 | 4639515.4028352341 | 441420.7663564924 | 4639015.4028352341 | 4156 | 216506.3509461179 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.0036257313 |
3 | 440843.4160873027 | 4639015.4028352341 | 441420.7663564924 | 4638515.4028352341 | 4157 | 216506.3509461179 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.0035670686 |
4 | 440843.4160873027 | 4640515.4028352341 | 441420.7663564924 | 4640015.4028352341 | 4154 | 216506.3509461179 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.0036764802 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
3274 | 440843.4160873027 | 4643015.4028352341 | 441420.7663564924 | 4642515.4028352341 | 4149 | 216506.3509461179 | POLYGON ((440843.416 4642765.403, 440987.754 4... | 0.0035832212 |
3275 | 440843.4160873027 | 4644515.4028352341 | 441420.7663564924 | 4644015.4028352341 | 4146 | 216506.3509461179 | POLYGON ((440843.416 4644265.403, 440987.754 4... | 0.0034729771 |
3276 | 440843.4160873027 | 4644015.4028352341 | 441420.7663564924 | 4643515.4028352341 | 4147 | 216506.3509461179 | POLYGON ((440843.416 4643765.403, 440987.754 4... | 0.0034993702 |
3277 | 440843.4160873027 | 4645515.4028352341 | 441420.7663564924 | 4645015.4028352341 | 4144 | 216506.3509461179 | POLYGON ((440843.416 4645265.403, 440987.754 4... | 0.0034270073 |
3278 | 440843.4160873027 | 4645015.4028352341 | 441420.7663564924 | 4644515.4028352341 | 4145 | 216506.3509461179 | POLYGON ((440843.416 4644765.403, 440987.754 4... | 0.0034446158 |
3279 rows × 8 columns
%%time
result_new = normalization (result_new, (resource_dropdown.value))
CPU times: user 3.46 ms, sys: 2.15 ms, total: 5.61 ms Wall time: 4.4 ms
# New result using AWR
result_new.head()
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.4160873027 | 4638515.4028352341 | 441420.7663564924 | 4638015.4028352341 | 4158 | 216506.3509461179 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.9147547364 |
1 | 440843.4160873027 | 4638015.4028352341 | 441420.7663564924 | 4637515.4028352341 | 4159 | 216506.3509461179 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.9304634378 |
2 | 440843.4160873027 | 4639515.4028352341 | 441420.7663564924 | 4639015.4028352341 | 4156 | 216506.3509461179 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.9329544297 |
3 | 440843.4160873027 | 4639015.4028352341 | 441420.7663564924 | 4638515.4028352341 | 4157 | 216506.3509461179 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.9157883242 |
4 | 440843.4160873027 | 4640515.4028352341 | 441420.7663564924 | 4640015.4028352341 | 4154 | 216506.3509461179 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.9478047543 |
result.head()
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.4160873027 | 4638515.4028352341 | 441420.7663564924 | 4638015.4028352341 | 4158 | 216506.3509461179 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.9032740792 |
1 | 440843.4160873027 | 4638015.4028352341 | 441420.7663564924 | 4637515.4028352341 | 4159 | 216506.3509461179 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.9296141212 |
2 | 440843.4160873027 | 4639515.4028352341 | 441420.7663564924 | 4639015.4028352341 | 4156 | 216506.3509461179 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.9297851328 |
3 | 440843.4160873027 | 4639015.4028352341 | 441420.7663564924 | 4638515.4028352341 | 4157 | 216506.3509461179 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.9112075113 |
4 | 440843.4160873027 | 4640515.4028352341 | 441420.7663564924 | 4640015.4028352341 | 4154 | 216506.3509461179 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.9536450358 |
# Difference is AWR approach - 50% threshold approach
diff = gpd.GeoDataFrame({
'column_id': result_new['id'],
'geometry': result_new['geometry'],
'hospital_icu_beds_result_new': result_new[resource_dropdown.value],
'hospital_icu_beds_result': result[resource_dropdown.value],
'difference': result_new[resource_dropdown.value] - result[resource_dropdown.value]
})
diff
column_id | geometry | hospital_icu_beds_result_new | hospital_icu_beds_result | difference | |
---|---|---|---|---|---|
0 | 4158 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.9147547364 | 0.9032740792 | 0.0114806572 |
1 | 4159 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.9304634378 | 0.9296141212 | 0.0008493167 |
2 | 4156 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.9329544297 | 0.9297851328 | 0.0031692969 |
3 | 4157 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.9157883242 | 0.9112075113 | 0.0045808129 |
4 | 4154 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.9478047543 | 0.9536450358 | -0.0058402815 |
... | ... | ... | ... | ... | ... |
3274 | 4149 | POLYGON ((440843.416 4642765.403, 440987.754 4... | 0.9205149708 | 0.9099586646 | 0.0105563061 |
3275 | 4146 | POLYGON ((440843.416 4644265.403, 440987.754 4... | 0.8882549218 | 0.8842145142 | 0.0040404077 |
3276 | 4147 | POLYGON ((440843.416 4643765.403, 440987.754 4... | 0.8959781608 | 0.8842145142 | 0.0117636466 |
3277 | 4144 | POLYGON ((440843.416 4645265.403, 440987.754 4... | 0.8748030804 | 0.8694608955 | 0.0053421850 |
3278 | 4145 | POLYGON ((440843.416 4644765.403, 440987.754 4... | 0.8799557428 | 0.8728246317 | 0.0071311111 |
3279 rows × 5 columns
pd.set_option('display.float_format', '{:.10f}'.format)
result_new.hospital_icu_beds
0 0.9147547364 1 0.9304634378 2 0.9329544297 3 0.9157883242 4 0.9478047543 ... 3274 0.9205149708 3275 0.8882549218 3276 0.8959781608 3277 0.8748030804 3278 0.8799557428 Name: hospital_icu_beds, Length: 3279, dtype: float64
result.hospital_icu_beds
0 0.9032740792 1 0.9296141212 2 0.9297851328 3 0.9112075113 4 0.9536450358 ... 3274 0.9099586646 3275 0.8842145142 3276 0.8842145142 3277 0.8694608955 3278 0.8728246317 Name: hospital_icu_beds, Length: 3279, dtype: float64
OSMNX assigns speedlimits to roads with unknown speed limits based on
Depends on overestimation or underestimation of speed limit
This is a threat to spatial heterogeneity
# Classified map function
def output_map_classified(output_grid, hospitals, resource):
ax=output_grid.plot(column=resource,
scheme='Equal_Interval',
k=5,
linewidth=0,
cmap='Blues',
figsize=(18,12),
legend=True,
label="Acc Measure",
zorder=1)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([325000, 370000])
ax.set_ylim([550000, 600000])
hospitals.plot(ax=ax,
markersize=10,
zorder=2,
c='black',
legend=False,
)
%%time
hospitals = hospitals.to_crs({'init': 'epsg:26971'})
result = result.to_crs({'init': 'epsg:26971'})
output_map(result, pop_data, hospitals, resource_dropdown.value)
plt.savefig('./results/figures/reproduction/{}_{}_continuous_OSMNX_50pct.png'.format(population_dropdown.value, resource_dropdown.value))
CPU times: user 4.6 s, sys: 187 ms, total: 4.79 s Wall time: 4.59 s
# Classified
output_map_classified(result, hospitals, resource_dropdown.value)
plt.savefig('./results/figures/reproduction/{}_{}_classified_OSMNX_50pct.png'.format(population_dropdown.value, resource_dropdown.value))
# Load accessibility map using old speed limit approach
import matplotlib.image as mpimg
old_speed = mpimg.imread('./results/figures/reproduction/pop_icu_beds_buffTrue_continuous.png')
fig = plt.figure(figsize=(18, 12))
plt.imshow(old_speed)
plt.axis('off') # Hide the axis
plt.show()
# Load classified accessibility map using old speed limit approach
old_speed = mpimg.imread('./results/figures/reproduction/pop_icu_beds_buffTrue_classified.png')
fig = plt.figure(figsize=(18, 12))
plt.imshow(old_speed)
plt.axis('off') # Hide the axis
plt.show()
%%time
hospitals = hospitals.to_crs({'init': 'epsg:26971'})
result_new = result.to_crs({'init': 'epsg:26971'})
output_map(result_new, pop_data, hospitals, resource_dropdown.value)
plt.savefig('./results/figures/reproduction/{}_{}_buff{}_continuous_OSMNX_AWR.png'.format(population_dropdown.value, resource_dropdown.value, str(buffered)))
CPU times: user 2.12 s, sys: 180 ms, total: 2.3 s Wall time: 2.08 s
# Classified
output_map_classified(result_new, hospitals, resource_dropdown.value)
plt.savefig('./results/figures/reproduction/{}_{}_buff{}_continuous_OSMNX_AWR.png'.format(population_dropdown.value, resource_dropdown.value, str(buffered)))
def output_map2(output_grid, base_map, hospitals, resource):
ax=output_grid.plot(column=resource, cmap='Oranges',figsize=(18,12), legend=True, zorder=1, vmin=0, vmax=0.2)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([314000, 370000])
ax.set_ylim([540000, 616000])
base_map.plot(ax=ax, facecolor="none", edgecolor='gray', lw=0.1)
hospitals.plot(ax=ax, markersize=10, zorder=1, c='blue')
hospitals = hospitals.to_crs({'init': 'epsg:26971'})
diff = diff.to_crs({'init': 'epsg:26971'})
output_map2(diff, pop_data, hospitals, 'difference') #output_grid, base_map, hospitals, resource
plt.savefig('./results/figures/reproduction/{}_{}_continuous_OSMNX_diff.png'.format(population_dropdown.value, resource_dropdown.value))
sorted_diff = diff.sort_values(by='difference', ascending=False)
sorted_diff
column_id | geometry | hospital_icu_beds_result_new | hospital_icu_beds_result | difference | |
---|---|---|---|---|---|
2105 | 5965 | POLYGON ((360576.626 579918.721, 360717.117 58... | 0.4926583679 | 0.2016816631 | 0.2909767048 |
2104 | 5964 | POLYGON ((360568.854 580418.855, 360709.344 58... | 0.3000853745 | 0.1029163752 | 0.1971689993 |
2971 | 5077 | POLYGON ((356027.491 593855.007, 356167.971 59... | 0.1468630885 | 0.0625784116 | 0.0842846769 |
2476 | 5432 | POLYGON ((357845.646 588380.560, 357986.129 58... | 0.1748529493 | 0.0941488470 | 0.0807041023 |
1760 | 5877 | POLYGON ((360124.066 581162.325, 360264.556 58... | 0.4963949888 | 0.4484850513 | 0.0479099375 |
... | ... | ... | ... | ... | ... |
2246 | 5167 | POLYGON ((356487.878 592111.304, 356628.359 59... | 0.2383980792 | 0.2930415765 | -0.0546434973 |
1530 | 6241 | POLYGON ((362011.917 571186.528, 362152.414 57... | 0.7406021406 | 0.7975966453 | -0.0569945048 |
1758 | 5878 | POLYGON ((360131.839 580662.191, 360272.329 58... | 0.7469170091 | 0.8139199099 | -0.0670029008 |
1916 | 5700 | POLYGON ((359218.935 583649.523, 359359.422 58... | 0.4712873710 | 0.5428271240 | -0.0715397530 |
2192 | 5344 | POLYGON ((357393.060 589624.144, 357533.543 58... | 0.1954999714 | 0.3030607634 | -0.1075607920 |
3279 rows × 5 columns
As mentioned above, for simplicity, I will only be discussing the results of the changes made to the study in the context of the at-risk population and ICU beds variables.
Introduction of the OSMNX speed and time functions on speed limit:
Using the OSMNX functions to fill in missing speed limit data by imputing speed limits based on road type yielded visible results to the accessibility map. The added speed limit data generally increased ICU bed accessibility in the central neighborhoods of Chicago. This is especially true along the coastline and some of the neighborhoods along the central-southwest areas of the map. However, it seems to have slightly reduced ICU bed accessibility in the more southern neighborhoods and a few hexagons in the northeast corner of the map along the coast. Largely, these findings imply that roads in the central part of the map that were previously missing speed were likely having their speeds underestimated by the original study. By imputing their speed limits based on similar road types, they gain speed limits higher than 35mph and thus increase their spatial accessibility to hospital resources. Overall, the map shows higher zones of accessibility across Chicago because the average speed limit increased after imputation.
Introduction of an AWR approach to assigning service ratio values to the hexagon grid:
Using AWR to more precisely assign service ratios from the catchments to the empty hexagons instead of using a simple 50% overlap threshold yields very little change to the accessibility map. While I expected that for hexagons that had multiple hospital catchments overlapping it, weighting the role of each catchment area based on its % overlap rather than just assigning the catchment with the most overlap would have significantly impacted the service ratio value assigned to the hexagon, there is ultimately little effect. The difference in service ratio assigned to the hexagon between the approaches is very small, with the vast majority of hexagons only seeing a difference of between -0.05 to 0.05. There are a couple outlier hexagons along the coast that saw between a 0.05-0.20 point increase in accessibility by using the AWR approach. Some of these coastal hexagons with greater differences seem to be parks (with fewer roads), which could make them more sensitive to methodological changes. Additionally, I suspect that they are close to multiple hospitals, some of which have a very high number of ICU beds and are high capacity trauma center hospitals. On average, most hexagons seem to see a small increase in accessibility (between 0-0.05) from using the AWR approach.
Sources of Geographic Uncertainty:
The initial Kang study identified various geographical threats to validity, as outlined by Schmitt in 1978. Some of these challenges were addressed through modifications made during the replication process, while others might remain unresolved due to limitations in the existing data.
Spatial dependence is ingrained in the nature of the study methodology. We assume that a given patient will go directly to the nearest hospital to receive treatment. Initially, this seems like the most rational thing to do. However in reality, individuals might select a farther hospital over a closer one due to quality of care, whether a given hospital accepts their insurance, etc. Without completely redesigning the underlying methodology of this study, the sources of uncertainty from spatial dependence are difficult to address
Partition distortion (or the MAUP) poses another major geographic threat to the validity of this study. Because we rely on county centroids to transfer population data to the hospital service areas to create the catchments, we inherently assume an even distribution of population across the county. However, by introducing the AWR approach to transferring service ratio to the hexagons, we reduce the partition distortion that occurred previously using the 50% threshold approach.
Boundary distortions are present on both the city and state level scales in the original study. Ignoring hospitals or roads outside of these boundaries ignores the reality that there may be closer hospitals outside of these boundaries that a person would more likely go to than a hospital within this boundary. Ultimately, Burt and Udoh fix this by introducing a buffer around the city to allow for hospitals and roads (and thus catchments) outside of the formal confines of Chicago. However, this reproduction still does not address potential state-level boundary distortion for the state of Illinois. Future reproductions could try to buffer the state boundary as well,but because of inconsistency of Covid tracking data between states, expanding the buffer outside of Illinois could introduce greater uncertainty.
Our modification/imputation of speed limits using the OSMNX functions reduced space-time interaction threats to validity of this study, since it now more accurately accounts for the speed at which people can travel via car and thus determines more accurate travel times to hospitals. However, this study still models accessibility based on the assumption that personal vehicles are used to drive to the nearest hospital. It does not consider public transit, biking, or walking. Using a more intermodal approach would better reflect accessibility for residents that do not have access to a car or a drivers license. Additionally, this approach does not account for traffic congestion, construction zones, road closures, and detours that could impact the shortest route to a hospital.
Classified Accessibility Outputs
Maintaining a threshold approach to assign service ratios to hexagons, the reproduced accessibility maps using the imputed speed limit data show that accessibility zones are in the same general areas (with north central Chicago being most accessible and south Chicago being less accessible), but increased accessibility does spread outward from central Chicago.
Maintaining the use of the imputed speed limit data, using an AWR approach to assign service ratios to hexagons instead of a threshold approach has almost no change on the service ratios assigned to hexagons. The accessibility maps look very similar. A handful of hexagons along the coast gain greater accessibility.
Luo, W., & Qi, Y. (2009). An enhanced two-step floating catchment area (E2SFCA) method for measuring spatial accessibility to primary care physicians. Health & place, 15(4), 1100-1107.