Abstract

Middlebury College’s Human Geography with GIS course (GEOG 0261) regularly conducts an analysis on “Flood Hazard Vulnerability in Vermont’s Mobile Homes” using QGIS; the GEOG 0261 analysis builds on Baker et al.’s 2011 study on Rapid Flood Exposure Assessment of Vermont Mobile Home Parks Following Tropical Storm Irene (see bottom of this report for formal citation).

In this report, I conduct a reproduction study of the analysis conducted in GEOG 0261, however I use a code-based approach to spatial analysis using R instead of QGIS. The motivation for this study is to see whether a basic spatial analysis assignment geared towards beginner geography students can be reproduced using a code-based approach. Additionally, I seek to improve internal validity to the study by reducing the impact of a boundary distortion along the Connecticut River. See below for background on the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment that students in GEOG 0261 are assigned:

“Accurate assessment of risk is an essential for effective response to any natural disaster. The methodologies used to assess risk can end up underestimating vulnerabilities. Tropical Storm Irene offers an example of inadequate assessment of risk, which then leads to inadequate planning for and response to a disaster. The storm inundated Vermont with unprecedented rainfall on August 28 and 29 of 2011. The storm destroyed 480 bridges and 960 culverts (where streams cross under a road), causing $350 million in road damage and cutting off road access to 13 mountain communities. Even Vermont’s emergency management offices were flooded! Some of the most affected people were living in mobile homes, whether on individual parcels of land or in mobile home parks. At least 130 mobile homes were destroyed and an additional 300 severely damaged (Figure 1). Our problem will evaluate assessments of flooding risks with a focus on mobile homes in Vermont. There are two different ways of assessing flooding risk in Vermont: one is by the federal agency, FEMA (The Federal Emergency Management Association), and one by a state agency, Vermont Rivers Program. The federal agency, FEMA, estimates flood risk in terms of inundation from rising water levels in stable river channels. Based on existing channels, FEMA hydrologists estimate the region of land that would be potentially flooded by a 1% (100-year) flood. The residents with mortgages in that region are required to purchase flood insurance. The state of Vermont’s River Corridors Program estimates flooding risk differently, using river corridors. After Irene, the state of Vermont recognized that the most damaging flooding in Vermont is not due to inundation but rather due to fluvial erosion: the erosion of riverbanks as the river channel widens or migrates to form new channels (Figure 1 and Figure 2). By this estimation, regions where rivers may erode and migrate to in the future are also at risk of flooding.”

Materials and procedure

Computational environment

Set up file path shortcuts

Data and variables

There are five layers for this analysis, each coming from a different primary source. Primary data sources for the study are to include …

1. e911pts.shp - point - epsg: 32145 e911 point location data for all residences and buildings in Vermont, for use with emergency response. The data file can be found on the Vermont Open GeoData Portal http://geodata.vermont.gov/ - SITETYPE: type of building structure/use case of the structure. “MOBILEHOME” is the SITETYPE that indicates a site is a mobile home/

  • Title: e911pts.shp
  • Abstract: Point location data for all ocations of residences and buildings in Bennington, Rutland, Windham, and Windsor counties for use with emergency response
  • Spatial Coverage: Bennington, Rutland, Windham, and Windsor counties
  • Spatial Resolution: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
  • Spatial Reference System: EPSG 32145 - NAD 83 Vermont
  • Temporal Coverage: Some time 2014-2022
  • Temporal Resolution: N/A
  • Lineage: Downloaded from the VT Open GeoData Portal by GEOG 0261 instructors, and cleaned to only include the 4 southernmost counties, and to select only the sitetype variable.
  • Distribution: The raw data is publicly available from the VT Open GeoData Portal, although the e911 point data is intended for use with emergency response
  • Constraints: N/A
  • Data Quality: Assumed to be accurate and representative of all residences and structures in Southern VT
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
OBJECTID unique identifier for each unique structure/residence numeric n/a n/a
sitetype Type of location, of which one category is ‘MOBILE HOME’ character n/a n/a
geometry point geometry unknown n/a n/a

2. FEMA_100yr.shp - polygon - epsg: 32145 FEMA Flood Zone polygons with codes. Codes starting with “A” indicate a 100-year flood risk zone. The data file can be found on the Vermont Open GeoData Portal http://geodata.vermont.gov/ - FLD_Zone: contains FEMA Flood Zone Codes. If a polygon has a code begins with an “A”, then that polygon indicates a 100-year flood zone.

  • Title: FEMA_100yr.shp
  • Abstract: multipart polygons of flood zones determined by FEMA
  • Spatial Coverage: Bennington, Rutland, Windham, and Windsor counties
  • Spatial Resolution: N/A
  • Spatial Reference System: EPSG 32145 - NAD 83 Vermont
  • Temporal Coverage: Some time 2014-2022
  • Temporal Resolution: N/A
  • Lineage: Downloaded from the VT Open GeoData Portal by GEOG 0261 instructors, and cleaned to only include the 4 southernmost counties.
  • Distribution: The raw data is publicly available from the VT Open GeoData Portal
  • Constraints: N/A
  • Data Quality: Assumed to be accurate and representative of all determined FEMA flood zones in southern VT
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
FLD_ZONE Identifies type of flood zone…FEMA Flood Zone Codes All codes beginning with ‘A’ are included in the 1% flood risk zone (100-year flood risk) character n/a n/a
geometry point geometry unknown n/a n/a

3. river_corridors.shp - polygon - epsg: 32145 Vermont river corridor polygons, as defined by Flood Ready Vermont. This flood hazard approach includes streams (with a 50 foot buffer) and rivers with watersheds more than 2km. The data file can be found on the Vermont Open GeoData Portal http://geodata.vermont.gov/

  • Title: river_corridors.shp
  • Abstract: multipart polygon layer of river corridors determined by Flood Ready Vermont
  • Spatial Coverage: Bennington, Rutland, Windham, and Windsor counties
  • Spatial Resolution: N/A
  • Spatial Reference System: EPSG 32145 - NAD 83 Vermont
  • Temporal Coverage: Some time 2014-2022
  • Temporal Resolution: N/A
  • Lineage: Downloaded from the VT Open GeoData Portal by GEOG 0261 instructors, and cleaned to only include the 4 southernmost counties
  • Distribution: The raw data is publicly available from the VT Open GeoData Portal
  • Constraints: N/A
  • Data Quality: Assumed to be complete for all river corridors determined by Flood Ready Vermont
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
OBJECTID Unique identifier for each individual river corridor numeric n/a n/a
GNIS_NAME name of the river/creek/stream that defines the river corridor character n/a n/a
OBJECTID_1 n/a numeric n/a n/a
ReachCode n/a numeric n/a n/a
geometry point geometry unknown n/a n/a

4. block_groups.shp - polygon - epsg: 32145 Census block group polygons in southern Vermont, with data on housing. The data file was acquired from the US Census ACS Survey 2014-2018 https://data.census.gov/ - mobileHU: estimated total number of mobile home housing units within the block group - totalHU: estimated total number of all housing units within the block group - county: name of county in which the block_group is located

  • Title: block_groups.shp
  • Abstract: multipart polygon layer of the block groups and counties for the 4 southernmost counties in Vermont
  • Spatial Coverage: Bennington, Rutland, Windham, and Windsor counties
  • Spatial Resolution: N/A
  • Spatial Reference System: EPSG 32145 - NAD 83 Vermont
  • Temporal Coverage: 2014-2018 ACS estimates
  • Temporal Resolution: N/A
  • Lineage: Downloaded from the US Census American Community Survey by GEOG 0261 instructors, and cleaned to only include the 4 southernmost counties, and select the variables for number mobile housing units and number of total housing units
  • Distribution: The raw data is publicly available from the US Census/ACS
  • Constraints: N/A
  • Data Quality: Assumed to be complete (so far as the ACS estimates are complete) for the 4 southern VT counties
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
fid unique identifier numeric n/a n/a
GEOID unique identifier numeric n/a n/a
mobileHU estimated total of mobile home housing units in each block group numeric n/a n/a
totalHU estimated total of all housing units in each block group numeric n/a n/a
COUNTYFP code indicating which county the block group is in numeric n/a n/a
county name of county in which the block_group is located character n/a n/a
geometry point geometry unknown n/a n/a

5. towns.shp - polygon - epsg:32145 Downloaded from the US Census Bureau. https://data.census.gov/ - townName: name of the town

  • Title: towns.shp
  • Abstract: multipart polygon layer of the towns within the 4 southernmost counties in Vermont
  • Spatial Coverage: Bennington, Rutland, Windham, and Windsor counties
  • Spatial Resolution: N/A
  • Spatial Reference System: EPSG 32145 - NAD 83 Vermont
  • Temporal Coverage: 2014-2018 ACS estimates
  • Temporal Resolution: N/A
  • Lineage: Downloaded from the US Census American Community Survey by GEOG 0261 instructors, and cleaned to only include the towns in the 4 southernmost counties.
  • Distribution: The raw data is publicly available from the US Census/ACS
  • Constraints: N/A
  • Data Quality: Assumed to be complete (so far as the ACS estimates are complete) for the 4 southern VT counties
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
fid unique identifier numeric n/a n/a
COUNTYFP code indicating which county the block group is in numeric n/a n/a
COUSUBFP unknown numeric n/a n/a
GEOID unique identifier numeric n/a n/a
townName name of the town numeric n/a n/a
geometry point geometry unknown n/a n/a

6. CT_corridor_final_dissolved_buffered.shp

  • Title: CT_corridor_final_dissolved_buffered.shp
  • Abstract: singlepart polygon of width 1080 meters of an estimated river corridor for the Connecticut River that I created
  • Spatial Coverage: along Windham and Windson Counties…Connecticut River mainstem from Wilder Dam down to confluence of Sugar River, NH/Ascutney, VT
  • Spatial Resolution: N/A
  • Spatial Reference System: EPSG 32145 - NAD 83 Vermont
  • Temporal Coverage: December 2023
  • Temporal Resolution: N/A
  • Lineage: Modified (buffered to 1080m width) version of a linestring layer for the CT River downloaded from the VT Open GeoData Portal
  • Distribution: The raw data (linestring) is publicly available from the VT Open GeoData Portal
  • Constraints: N/A
  • Data Quality: This is my best estimate of what the river corridor for areas along the Connecticut River would look like
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
OBJECTID Unique identifier for each individual river corridor numeric n/a n/a
WBID unknown character n/a n/a
WBDESC description of spatial extent character n/a n/a
geometry point geometry unknown n/a n/a

I received pre-processed versions of these data from the course instructors of Middlebury College’s GEOG 0261 so as to best reproduce the students’ analysis using the exact same data that the students receive; however, the instructors downloaded these data from the sources listed. It is unknown the exact dates which the instructors downloaded and pre-processed these data, nor is it know the exact pre-processing steps taken by the instructors. For the purpose of this reproduction, I am assuming that a competent GIS analyst or data analyst could easily download the raw files from the sources mentioned above, and pre-process them into the format that is ultimately provided to the students. The .shp files included in the data/derived folder of the repository are those pre-processed files that the students receive, with the exception of the towns.shp file (the copy that I received from an instructor was corrupted), so I downloaded this directly from the ACS, and the CT_corridor_final_dissolved_buffered.shp which I produced in QGIS using methodology outlined below. The root source of this CT corridor layer is a linestring layer of the Connecticut River which I downloaded from the Vermont Open GeoData Portal (publicly accessible)

You can find the full metadata for each of these variables in the data/metadata section of the repository

Prior observations

I - the author of this reproduction study - have spent the past 3.5 years living as a student in Vermont. I am familiar with the flood risk faced in Vermont, the geography of the state, and how the state makes data publicly accessible. Thus, I have prior experience with the entirety of this study, although this is not a concern given that no statistical tests are conducted and no models are built. The goal of this study is merely to reproduce a study that is usually conducted in QGIS but in R.

I was also a student in GEOG 0261 (formerly GEOG 0120) and I conducted this study in January, 2022 in QGIS.

Bias and threats to validity

Going into this study, I know that there is a boundary distortion along the eastern edge of the state along the Connecticut River error that compromises internal validity. Vermont River Corridors (the shapefile) does not include a river corridor model for the Connecticut River. I will attempt to estimate my own river corridor for the Connecticut River.

Also, I know going into this study that there are issues with small numbers of mobile homes in some towns that are used as denominators in calculating percentages, and this will lead to overly sensitive and overly inflated percentages in some towns. I do not plan to change this, as that would require calculating the area of towns and counties, which I did not have time for when completing this study.

Lastly, the original QGIS study utilizes an area weighted re-aggregation for determining a number of mobile homes at risk in the FEMA flood zones, based on ACS 2014-2018 survey data and assuming an even distribution of mobile homes across counties. However, the GEOG 0261 course has repeatedly demonstrated that this is an inaccurate approach to the research question, and thus I will not try to reproduce this part of the study. This is an issue of a modifiable areal unit problem.

Data transformations

Describe all data transformations planned to prepare data sources for analysis. This section should explain with the fullest detail possible how to transform data from the raw state at the time of acquisition or observation, to the pre-processed derived state ready for the main analysis. Including steps to check and mitigate sources of bias and threats to validity. The method may anticipate contingencies, e.g. tests for normality and alternative decisions to make based on the results of the test. More specifically, all the geographic and variable transformations required to prepare input data as described in the data and variables section above to match the study’s spatio-temporal characteristics as described in the study metadata and study design sections. Visual workflow diagrams may help communicate the methodology in this section.

Examples of geographic transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.

Examples of variable transformations include standardization, normalization, constructed variables, imputation, classification, etc.

Be sure to include any steps planned to exclude observations with missing or outlier data, to group observations by attribute or geographic criteria, or to impute missing data or apply spatial or temporal interpolation.

Read in the layers

Exploratory Data Analysis

Visualize the FEMA flood zones and river corridors

## tmap mode set to plotting
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\FEMA_flood_zone_map.pdf
## Size: 6.25 by 7.819444 inches
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\river_corridor_map.pdf
## Size: 6.25 by 7.819444 inches

Clean up the flood zone and river corridor layers into more usable formats

Clean up the FEMA data to identify all the zones that start with “A” (aka in the 100 year flood zone)…treat them all the same

## 
##    A   AE   AO 
##  567 1832    2
## Simple feature collection with 2401 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 424802.5 ymin: 25228.79 xmax: 523718.9 ymax: 158609.9
## Projected CRS: NAD83 / Vermont
## # A tibble: 2,401 × 3
##    FLD_ZONE                                                       geometry flood
##  * <chr>                                                <MULTIPOLYGON [m]> <lgl>
##  1 AE       (((446276.7 124090.2, 446286.8 124096.6, 446295.7 124103.9, 4… TRUE 
##  2 AE       (((446494.3 124089.5, 446493.5 124098, 446491.4 124108.5, 446… TRUE 
##  3 AE       (((444273.9 123609.2, 444286.7 123605.9, 444296.5 123602.8, 4… TRUE 
##  4 AE       (((442981.8 122314.2, 442982.7 122310.9, 442986.9 122303.6, 4… TRUE 
##  5 AE       (((448164.3 123952.9, 448168.3 123952.4, 448171.6 123951.9, 4… TRUE 
##  6 AE       (((444449.3 123728.5, 444452.7 123721.8, 444458.8 123713.4, 4… TRUE 
##  7 A        (((453734.8 130105.7, 453734.8 130137.5, 453734.8 130172.4, 4… TRUE 
##  8 AE       (((443409.4 123216.8, 443410.1 123219.9, 443412.3 123223.9, 4… TRUE 
##  9 AE       (((448766.4 123935.4, 448768.5 123934.6, 448772.2 123933.4, 4… TRUE 
## 10 AE       (((448860.4 123953.6, 448859.9 123957.1, 448854 123964.4, 448… TRUE 
## # ℹ 2,391 more rows

Group by flood (all of the 100 year flood zones) and dissolve the geometry to get a single multipart flood zone

And do the same thing for river corridors…create a single multipart river corridor

Calculating table columns to reproduce Table 1

Column 1 (step 1): calculate the total number of mobile homes in each county from the ACS data - aggregate from the block groups

Column 1: Number of Mobile Homes in Each County (ACS)
county number_of_MHs
Bennington 1277
Rutland 1992
Windham 1833
Windsor 2427

Columns 3 and 4 (steps 3 and 4): mobile homes at risk

I exclude column 2, which was created using area weighted aggregation of mobile homes.However, the GEOG 0261 instructors discovered that this approach is less accurate than using the e911 point data to identify mobile homes at risk, so I will forego reproducing the AWR approach.

Isolate only the points/residences that are mobile homes from the e911 data

Add county and town variable (from the census/ACS data) to the mobile homes points, joining by location

NOTE: I attempted also to sum the number of e911 mobile home points for each county. This yielded a different number of mobile homes than Column 1 indicates. Because the ACS measurement of mobileHU is a survey-based estimate, it does not represent the true number of mobile home structures. This is a source of geographic uncertainty to this analysis, specifically an issue of spatial heterogeneity and construct validity.

I will proceed with using Total Number of Mobile Homes column from the ACS data to be consistent with the GEOG 0261 analysis, but this is something that may want to be changed in the future.

Add buffer (18.3 meters aka 60 feet) to mobile home points to account for their structure sizes

Identify the mobile homes within FEMA flood zone, and group by county to get total number of MHs in the flood zone by county

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
Column 3: Number of MHs at Risk (E911 and FEMA Flood Zones)
county mobile_home_count
Bennington 189
Rutland 164
Windham 299
Windsor 198

Now do the same thing but for river corridors…

Identify the mobile homes within river corridors, and group by county to get total number of MHs in the river corridors by county

Column 4: Number of MHs at Risk (E911 and River Corridors)
county mobile_home_count
Bennington 130
Rutland 204
Windham 298
Windsor 353

Join all columns together to get the final table! Table 1:

Table 1. Expected Results Table of Mobile Home Flood Risk
county number_of_MHs MHs_at_risk_FEMA MHs_at_risk_River_Corridors FEMA_rate RC_rate
Bennington 1277 189 130 0.1480031 0.1018011
Rutland 1992 164 204 0.0823293 0.1024096
Windham 1833 299 298 0.1631206 0.1625750
Windsor 2427 198 353 0.0815822 0.1454471

Unplanned deviation for reproduction: I decided to calculate “risk rates” to indicate what proportion of a county’s mobile homes lie within the FEMA flood zones and the River Corridors, respectively (the last two columns at the righthand side of the table). They are pretty similar for Windham County. In Bennington County, the FEMA risk rate is higher than the River Corridor risk rate. In Windsor County and Rutland County, the River Corridor risk rate is higher than the FEMA risk rate.

These results perfectly match the results achieved using the QGIS approach in GEOG 0261. The reproduction of this table was a success!

Visualize MH flood risk by town:

First, I need to find the number of mobile homes by town

Next, I need to find which mobile homes are in EITHER a FEMA flood zone OR River Corridor

This will cast a wider net than if we look at just flood zones or river corridors individually, as this will maximize the number of mobile homes that are determined to be at risk. At this stage in the analysis, we care more about seeing which towns have the highest vulnerability of mobile homes to flooding, not whether the VT River Corridor or FEMA Flood Zone approach is more accurate. Thus, including both flood risk identification metrics is a safer approach to ensure we identify all mobile homes that are at some level of risk to flooding.

Create a table to show the 10 towns with the highest mobile home flooding risk (Table 2)

Table 2. Towns with the Highest % of Mobile Homes at Risk to Flooding (either FEMA or RCs)
town mobile_home_count at_risk_count pct_mh_at_risk
Woodford 27 23 0.8518519
Woodstock 70 48 0.6857143
Sandgate 7 4 0.5714286
Jamaica 100 49 0.4900000
Windsor 63 30 0.4761905
Killington 13 6 0.4615385
Pittsfield 9 4 0.4444444
Plymouth 18 8 0.4444444
Wilmington 94 38 0.4042553
Proctor 15 6 0.4000000

Results slightly differ between this and the GEOG 0261 results because I directly downloaded the towns.shp file from the VT Open GeoData Portal, while the GEOG 0261 class uses a pre-cleaned provided towns shapefile layer. I could not use the provided one due to file corruption issues. However, the provided one for the class distinguishes between Rutland Town and Rutland City, while the layer that I downloaded treated the two as a combined town of “Rutland.” Rutland Town has a pct_mh_at_risk value of 57.89, so if my towns file distinguished just the town portion, it would be in the top 10 of highest risk towns for mobile homes at risk.

However, overall, this is basically a perfect reproduction! Not much more to add here, other than a celebration that the code-based approach in R seems to be doing an excellent job at producing the QGIS results from GEOG 0261.

Plot a choropleth map of the % of mobile homes in each town that are at risk to flooding, and export (“Map 1”)

## tmap mode set to plotting
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\pct_mh_at_risk_by_town.pdf
## Size: 9.125 by 5.361111 inches

Technically an optional map in GEOG 0261, but most students did create this.

Aside from the Rutland Town boundary issue, this map perfectly resembles the QGIS output from GEOG 0261. Another win for !

Unplanned deviation for reproduction: identify areas where FEMA 100 year Flood Zones and VT River Corridors do not line up

I was curious about the discrepancy between the two metrics so created a layer that differences the two polygon layers and plots it with a satellite base-map. Notably, the River Corridors include fewer lakes/ponds and large rivers. This is significant, because these water bodies can still cause severe flooding if water influx causes them to over spill their banks. Additionally, note that the Connecticut River is not included at all in the River Corridors.

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
## tmap mode set to interactive viewing

Planned deviation for reproduction: Identify how many mobile homes would be likely to be in a River Corridor of the Connecticut River, if the creators of the River Corridor shapefile had included the Connecticut River.

This will help improve the geographic uncertainty caused by the boundary effect of the Connecticut River technically being fully part of New Hampshire, not Vermont.

Number of Mobile Homes Within My Estimated Connecticut River Corridor, By County
county mobile_home_count
Windham 61
Windsor 283

While this is definitely a bit of an overestimate, it does highlight how the VT River Corridor model’s omission of the Connecticut River excludes a significant number of mobile homes. There are a large number of mobile home parks close to the Connecticut River, which is a contributing factor. 61 mobile homes in Windsor at risk to flooding and 283 mobile homes in Windsor are at risk to flooding, aka they intersect my estimated river corridor for the Connecticut River.

To estimate a Connecticut River Corridor polygon layer (ct_river_corridor), I performed the following in QGIS: - Load in a line layer for the centerline of the river channel, which I downloaded from the VT Open GeoData Portal - I eyeballed the average width of the river channel by taking measurements with the measurement tool at various points along the river, and I made a well-informed assumption that the average channel width is 180 meters wide (this is probably a bit of an overestimate). I then buffered the centerline by 90 meters to get a polygon that represents the river channel that is consistently 180 meters (buffered 90 meters from the centerline on either side).
- I then added an additional 580 meter buffer. This creates a polygon representing the river corridor that is consistently 1080 meters wide, or 6x the 180 meter average channel width

The documentation for the report on how River Corridors were constructed, on the Flood Ready Vermont website, said that a good rule of thumb is that a river corridor is 6x the width of the stream/river channel that it is based around. While they do also take slope and geologic traits to provide a more specific river corridor profile, that is beyond the scope of this analysis, and I will make do with the 6x width approach.

Check out the polygon that I made to estimate the River Corridor for the Connecticut River

## tmap mode set to interactive viewing

Results

Table 1: Mobile Homes at Risk by County (FEMA vs. River Corridors)

Ultimately, an R-based code approach to this spatial analysis study yields the exact same results that are found by Middlebury’s GEOG 0261 class using the QGIS GUI. I find that the number of total mobile homes (ACS data), mobile homes at risk (e911 and FEMA data), and mobile homes at risk (e911 and river corridor data) for Bennington, Rutland, Windham, and Windsor Counties are exactly the same. In Bennington County, the FEMA approach identifies significantly more mobile homes to be at risk than the River Corridor approach does. In Windsor County, the two approaches predict about the same. In Rutland and Windsor Counties, the River Corridor approach identifies significantly more mobile homes to be at risk than the FEMA approach. These findings were true for both total number of mobile homes at risk and proportion of mobile homes in the county that are at risk.

Map 1 and Table 2: Percentage of Towns at Risk to Flooding (Either FEMA or River Corridor), by Town

A slight variation, as discussed earlier in the report, arises from a difference in the town shapefile layer. The town layer that was provided to the GEOG 0261 class distinguishes between Rutland Town and Rutland City, while the town layer that I downloaded from the American Community Survey (since I was having corrupt file issues with the towns.shp used by the GEOG 0261 class) just shows Rutland as a single town.It is possible that the course instructors and I both started with the same town shapefile, but the instructors could have combined the two towns into one during their data processing. Another reason could be that between the time the instructors downloaded the town shapefile and the time I downloaded the file, the town borders could have changed (ie. Rutland was broked into Rutland City and Rutland Town). It is difficult to know exactly without knowing when the instructors downloaded their data and what they did to process it.

Reproducing the town-based risk map yielded the same exact results as the GEOG 0261 QGIS approach, except for Rutland. When sorting towns based on highest percentage of their mobile homes at risk in the QGIS approach (Table 2), Rutland Town had the third highest % of mobile homes at risk out of all of the towns in the four counties. However, when considering Rutland as a single entity in my R reproduction, it does make the top 10 list. Instead, all the towns get shifted up one place, with Proctor, VT being the town with the 10th highest risk. There are some threats to validity here, namely the small number problem, which I will addreess in the Discussion section.

Woodford and Woodstock have the first and second highest % of mobile homes at risk in both R and QGIS, with 85.18% and 68.57% respectively of their mobile homes at some risk to flooding, as determined by FEMA or River Corridors.

R code vs. QGIS GUI:

Ultimately, the main goal of this reproduction was to find a way using R code to do all the spatial analysis steps that are taught to GEOG 0261 students in QGIS in the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment for the course. I followed the exact same workflow provided by the instructors of GEOG 0261 to create Table 1. While I technically could have done a select by location to identify the mobile homes in the flood zones and river corridors, I chose to use a join by location and then a filter because of a warning that was thrown and because it was computationally faster.

I relied on the stars and tmap libraries to work with the spatial data.

Addition of the estimated Connecticut River corridor:

Based on my estimation of the river corridor for the Connecticut River, I found that 61 mobile homes in Windham and 283 in Windsor are missed when using the existing river corridor shapefile that is published on the VT Open GeoData Portal and is provided to the GEOG 0261 students. Although I likely overestimate the extent of the river corridor a bit (since I did not account for geologic conditions or topography/slope…I only considered meander belt width), this shows that VT River Corridors do miss out on identifying mobile homes that could be at risk. Fortunately, the FEMA Flood Zones do seem to contain most of the mobile homes along the Connecticut River that do appear to be at risk (see the presence of FEMA Flood Zones in the first map presented at the top of this report…the map of Flood Zones).

Discussion

Sources of uncertainty and geographic threats to validity:

There are several threats to validity/sources of uncertainty, as proposed by Schmitt (1978), that are inherent to “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment, as well as this reproduction study of it. Although this reproduction study partially addresses the boundary distortion problem posed by the lack of published River Corridors along the Connecticut River, this reproduction by no means fixes all the sources of uncertainty.

There are a few potential threats to construct validity that this reproduction study does not directly address.

Notably, Table 2 and Map 1 do illustrate how the study runs into a small number problem when calculating the the % of mobile homes at risk in each town. Because some towns have very few mobile homes to begin with (ex. Sandgate has 7, Pittsfield has 9), any rate or % calculation that uses number of mobile homes as its denominator is highly sensitive and distorted. Because Sandgate has 4 mobile homes at risk and because Pittsfield has 4 mobile homes at risk, their % of mobile homes at risk is incredibly high. Sandgate is over 50%! This makes it seem like they have a huge problem with their mobile homes being at risk. Sandgate has the 3rd highest at risk rate of any town, and Pittsfield has the 7th highest at risk rate of any town, despite the two of them only having 8 mobile homes at risk combined. Meanwhile, Pownall, which has 135 mobile homes at risk out of its 359 total homes at risk (37.6%), and Brattleboro, which has 127 mobile homes at risk out of its 401 total mobile homes (31.7%) don’t even crack the top 10 list, and the choropleth map makes their risk rates seem less severe. While the study make Map 1 based on total number of mobile homes at risk, and sort Table 2 based on total number of mobile homes at risk (instead using the rate), then we would run into the opposite problem where the largest towns with the most area would have the most mobile homes at risk…basically running into the Modifiable Areal Unit Problem. It might make more sense to divide by area of the town in future studies to normalize for size of the town, rather than dividing number of mobile homes at risk by total number of mobile homes. It needs a better way to normalize. Table 1 (which reports on counties, not towns) does not experience this problem as much because the number of mobile homes is over 1000 in each county, which is a large enough sample and denominator to keep any rate calculations from being distorted. These matters all highlight the importance of scale and areal unit, and are examples of partition distortions and scale distortions affecting internal validity of the study

While Table 1 is more resilient to distortions than Table 2, a concern that I discovered with Table 1 is that it uses ACS estimates to determine total number of mobile homes in each county, rather than spatially seeing which e911 mobile home points fall within each county and counting those. This is a problem or question of measurement and spatial heterogeneity, as the ACS surveys might not adequately represent or weight the population living in mobile homes in Vermont. There big differences between living in a mobile home park and an independent mobile home, and I have a feeling that the ACS surveys do not always take these distinctions into account in their surveys.

Another threat to validity of the original analysis was the boundary distortion caused along the Connecticut River for creating River Corridors. According to Flood Ready Vermont,

“During the initial development of the river corridor base map, the DEC recognized the Connecticut River flows in a unique geologic and geographic setting and is influenced by numerous impoundments. In order to create an appropriate River Corridor for the Vermont side of the Connecticut, the Rivers Program will conduct a separate analysis in 2019 to review the influence of features such as escarpments and impoundments that affect fluvial processes, valley bottom lands, floodplains, river planform, and corridor widths. For Vermont projects being reviewed while this River Corridor is being developed DEC will make site specific river corridor and floodway determinations in accordance with the Flood Hazard Area & River Corridor Protection Procedure. When the Connecticut River Corridor is developed it will be available for a public review and comment similar to the release of the Statewide River Corridor Base Map.” LINK

I believe that the Connecticut River also may not have been included because the border between VT and NH actually lies along the VT shoreline of the river, and not at the centerline. Hence, the Connecticut River is technically entirely outside of the state of Vermont.

However, there does not seem to be any evidence that the Connecticut River was ever included in 2019, upon searching in the VT Open GeoData Portal. As I discussed earlier in the report, I crudely estimated what river corridor for the Connecticut River might look like, and identified mobile homes that fell within this estimated corridor. This identification of mobile homes that would not have been identified by the River Corridor approach using published River Corridor data helps address the threat to internal validity caused by the boundary distortion on the eastern edge of the state where no River Corridors are defined for the Connecticut River. Additionally, since FEMA flood zones do contain most of the mobile homes along the Connecticut River, there is still some level of acknowledgement that these mobile homes are at risk to flooding. If the FEMA Flood Zones were similarly absent along the Connecticut River, this would be a different story.

For this study, I relied on the data that is provided by the instructors to the students of GEOG 0261, which is preprocessed by the instructors. The formal student instructions for the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” mention that the data is all downloaded from either the VT Open GeoData Portal or from the US Census/American Community Survey website, both of which are freely accessible to a public audience. I used this data because that is what the students receive in class, and I wanted to see if it was possible for a student to reproduce the analysis in the R coding environment. However, I found that the “towns.shp” file from when I was a student in GEOG 0261 (formerly GEOG 0120) was corrupted, and thus I re-downloaded the file from the ACS. However, the file that I downloaded did not distinguish between Rutland Town and Rutland City, as the pre-processed data did.

Using the pre-processed data provided by the instructors certainly makes this work less reproducible and introduces potential sources of uncertainty if one was to download the data from the source themselves and process the data themselves. It is challenging to offer more insight into this not knowing the exact files that the instructors downloaded, when they downloaded them, and what pre-processing steps they took.

I have provided in the data/raw/public folder unprocessed data layers that I downloaded from the VT Open GeoData Portal and ACS that I believe where what the instructors used. Users of this reproduction study and its repository can investigate these data sources, but I am not completely confident that all the right data layers are there. Use caution and use these files at your own risk.

The methodology of this study, which is largely adopted from that of Baker et al. (2014), seems to have strong external validity given the assumption that you have point locations for mobile homes (or other structures, if that is what you are investigating the flood risk of). However, since not every county in the US uses E911 (as of 2014 when the Baker et al. paper was published), this methodology might not be reproducible in those counties without point data on mobile homes. As the GEOG 0261 instructors found, using an area-weighted aggregation approach is certainly possibly for identifying mobile homes in flood hazard areas, but this would assume an even distribution of mobile homes within a county and is thus less accurate.

Because this methodology does not perform much statistical analysis, model building, or perform any statistical tests, there are not many threats to statistical or internal validity. The methodology is very straightforward, and relies mostly on simple spatial analysis techniques such as buffering, joining by location, and selecting by location.

Future additions to the study:

Future additions of the study could work to improve on the sources of uncertainty and threats to validity. Notably, a future study could count the number of mobile home points (from e911 data) in each county to use as the first column in Table 1 (and thus to use as a denominator in the rate columns). Currently, the table relies on ACS estimates of mobile homes to provide values for the mobile_home_count variable. However, the ACS survey is just a well-informed estimate, whereas e911 points are known locations of mobile homes.

If Flood Ready Vermont ever does create a river corridor for the Connecticut River, that would warrant further another reproduction attempt. Or, someone with strong geologic knowledge could try to better estimate a river corridor for the Connecticut River and use that. See the docs/presentation folder for two reports from the Vermont Agency of Natural Resources on how River Corridors are constructed. Or, view them online HERE and HERE.

Additionally, since it is unknown when the GEOG 0261 data was actually constructed, someone could do reproduce this study with all new downloaded data, especially using newer ACS data (newer than 2014-2018).

Pedagogical implications:

Using R to teach this analysis certainly has its advantages over QGIS. Because all of the R code is documented in a single file, all of the code can easily be run altogether and re-run as many times as necessary. This makes it especially to teach how reproduction studies work. Meanwhile, although QGIS is open source and easily acessible to students, QGIS involves clicking lots of buttons based on following a workflow, with lots of room to go wrong by clicking the wrong buttons, missing a step, etc. It is challenging to see where you went wrong in QGIS, and it is easier to make mistakes. Using R for basic spatial analysis like in this study is very straightforward, allows for various sections of the analysis to be run all at once, and to easily record/track how you created the maps, data frames, variables, and tables that you did. Performing this analysis in R yielded the same results (except for the Rutland issue…but that’s because I used a different towns layer) as QGIS, meaning that it is completely valid to teach students this study using R. It would not impact their results. This reproduction of the GEOG 0261 study provides strong evidence in support of teaching intro geography students spatial R. There is room for both tools in teaching GIS.

Policy implications:

Both the original study and this reproduction study demonstrate how across Vermont, mobile homes tend to be at a high risk of flooding. Given that residents of mobile homes tend to be of lower income and marginalized communities, this warrants targeted policy efforts (especially in the towns and counties that this study shows to be higher risk) to mitigate their vulnerability to flooding.

Connecticut River:

As I have implied already, given that many mobile homes along the Connecticut River would not be identified as at risk to flooding when using the published River Corridor approach, policymakers should ensure that these homes are not left out of any flood program, flood insurance, or other policy efforts designed to mitigate flood risk for vulnerable communities since they appear to be at an increased risk of being overlooked. However, fortunately FEMA does acknowledge most of these CT River mobile homes in their 100 year flood zones.

Conclusion

I successfully reproduced the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment from Middlebury College’s GEOG 0261 course (which is taught using QGIS) using an code-based approach in R. I found that the core spatial analysis concepts used are analogous between programs and that using R can be advantageous for minimizing mistakes in the analysis (and for more streamline, organized, and clear workflows).

Additionally, I find that the original assignment/analysis, which is modeled off of the methodology employed by Baker et al. (2014), utilized a Vermont River Corridor layer that does not exist along the eastern boundary of the state - the Connecticut River. Thus, a not-insignificant number of mobile homes in Windsor and Windham counties are not identified as at risk of flooding when using the Vermont River Corridor approach to determining flood vulnerability.

The original study, as well as this reproduction, indicate the high levels of vulnerability to flooding that mobile homes in Vermont face. This is especially relevant given the catastrophic flooding that Vermont experienced during the Summer of 2023, and warrants policy actions to protect those that are most vulnerable.

Lastly, as the capstone project to my GEOG 0361, this reproduction has been produced in a way such that it is intended to be reproducible and accessible. Open Science is the future of all science. I hope that future authors, and even instructors of the GEOG 0261 course, can take advantage of the code and data that I will make publicly accessible as part of this reproduction.

Integrity Statement

I followed and accomplished what I set out to do in my preanalysis plan. I did make a few slight deviations to improve the code and the interpretability of my report/findings:

I - the author of this reproduction study - state that I completed what I outlined in my preregistration to the best of my knowledge and that no other preregistration exists pertaining to the same hypotheses and research.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

References

Baker, D., Hamshaw, S. D., & Hamshaw, K. A. (2014). Rapid flood exposure assessment of Vermont mobile home parks following Tropical Storm Irene. Natural Hazards Review, 15(1), 27-37. DOI: 10.1061/(ASCE)NH.1527-6996.0000112.

Schmitt, R. R. (1978). Threats to validity involving geographic space. Socio-Economic Planning Sciences, 12(4), 191–195. https://doi.org/10.1016/0038-0121(78)90044-7