The purpose of this Rmarkdown doc is to show the work I did for a GGWash post about air pollution.
If you notice errors, have questions, or want to team up to make a street car-free, please email me at edwardpierrerodrigue [at] gmail [dot] com.
First, here’s a bit about the hardware I used. My set up was super cheap & required almost no coding skills. Anyone can get this working!
Below is what the particulate matter sensor looks like. It’s just a raspberry pi pico plugged into a particulate matter sensor “hat” from sb components. (To be clear: I was not paid, compensated, whatever, by any of these companies to hawk their stuff; these are just the components I ended up using. You can probably find similar setups that work just as well.) The pi microcontroller runs about $10. The sensor costs about $50.
The people that make the sensor have some good example code on github that’ll get your sensor working in a few minutes. I tweaked that code slightly to save the readings into a text file.
The raspberry pi pico just gets plugged into the “hat”, then you plug the pico into your laptop, run the code, and start getting readings. A laser in the sensor basically just counts how many particles pass through its beam, giving you an estimate of the “parts per million” (ppm) of particles in the air.
Below is my general pollution-sniffing sensor. I was feeling super cheap after blowing a few hundred dollars on truck parts, trying & failing to build a diy NOx sensor, so this build runs about $20: $10 for the sensor and $10 for the off-brand arduino uno R3 microcontroller. The sensor is called an MQ-135 sensor. It’s a clever little device that changes the resistance of the electrical circuit it’s plugged into, depending on how much pollution (e.g. molecules other than oxygen) are interacting with it’s surface (made of tin dioxide). You can read more about the science here.
Like the above setup, you just wire up the MQ-135 sensor to the microcontroller (here, an arduino board, rather than a raspberry pi board), plug the microcontroller board into your laptop, upload the code, and hit run.
Couple notes:
If you have any questions, feel free to ping me; my email is above!
Anyway, here’s my data analysis.
First, the particulate data:
Map the particulate matter data I collected:
pal <- colorNumeric(
palette = colorRampPalette(c('green', 'orange', 'red'))(length(df$average_pm25)),
domain = df$average_pm25)
df %>%
leaflet( options = leafletOptions(zoomControl = TRUE,
zoomSnap = 0.25,
zoomDelta = 1)) %>%
addProviderTiles(providers$CartoDB.Positron) %>% # Add default OpenStreetMap map tiles
addCircleMarkers(lng=~Lon, lat=~Lat,
radius=1,
color=~pal(average_pm25),
popup=~as.character(average_pm25),
opacity=.8)
Plot the data as a time series (for the GGWash post, I ended up just using google sheets to make the graph).
below_axis_val = -18
below_axis_text_val = -23
p <-
ggplot() +
geom_line(data = df, aes(x=row_id, y=pm25)) +
theme_minimal() +
xlab("") +
ylab("PM 2.5 reading (parts per million)") +
scale_x_continuous(breaks=seq(1, max(df$row_id), 5*60),
label=df$hm[seq(from=1, to=max(df$row_id), by=5*60)] ) +
ggtitle("Walking home, getting caught in clouds of exhaust") +
coord_cartesian(ylim = c(0, max(df$pm25)), clip="off") +
geom_segment(aes(x = min(df$row_id),
y = below_axis_val,
xend = max(df$row_id[df$Segment == "In my office building"]),
yend = below_axis_val),
size=1
) +
annotate("text",
x = min(df$row_id)+250,
y = below_axis_text_val, label = "In my office") +
geom_segment(aes(x = max(df$row_id[df$Segment == "In my office building"])+50,
y = below_axis_val,
xend = max(df$row_id[df$Segment == "12 & Q"]),
yend = below_axis_val),
size=1
) +
annotate("text",
x = max(df$row_id[df$Segment == "In my office building"])+600,
y = below_axis_text_val, label = "Walking north on 12th") +
geom_segment(aes(x = max(df$row_id[df$Segment == "12 & Q"])+50,
y = below_axis_val,
xend = max(df$row_id[df$Segment == "Malcolm X Park"]),
yend = below_axis_val),
size=1
) +
annotate("text",
x = max(df$row_id[df$Segment == "12 & Q"])+400,
y = below_axis_text_val, label = "Logan Cir / U st") +
geom_segment(aes(x = max(df$row_id[df$Segment == "Malcolm X Park"])+50,
y = below_axis_val,
xend = max(df$row_id[df$Segment == "In apt"]),
yend = below_axis_val),
size=1
) +
annotate("text",
x = max(df$row_id[df$Segment == "Malcolm X Park"])+350,
y = below_axis_text_val, label = "Adams Morgan") +
annotate(
'curve',
x = 1750,
y = 109,
yend = 110,
xend = 1011+100,
linewidth = 1,
curvature = -0.2,
arrow = arrow(length = unit(0.5, 'cm')),
color='darkgrey'
) +
annotate(
'curve',
x = 2864-650,
y = 108,
yend = 100,
xend = 2864-120,
linewidth = 1,
curvature = 0.2,
arrow = arrow(length = unit(0.5, 'cm')),
color='darkgrey'
) +
annotate(
'curve',
x = 2009,
y = 101,
yend = 91,
xend = 2009,
linewidth = 1,
curvature = 0,
arrow = arrow(length = unit(0.5, 'cm')),
color='darkgrey'
) +
annotate("text", x = 2000, y = 109,
size=4,
label = "Plumes of exhaust\nfrom vehicles",
color='darkgrey') +
annotate(
'curve',
x = 3128+100,
y = 95,
yend = 85,
xend = 3128+10,
linewidth = 1,
curvature = -0.35,
arrow = arrow(length = unit(0.5, 'cm')),
color='darkgrey'
) +
annotate("text", x = 3128+200, y = 103,
size=4,
label = "Holding the sensor inches\nfrom moped exhaust pipe",
color='darkgrey') +
theme(plot.margin = unit(c(2,1,2,0), "lines"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p
Here’s the data from the MQ-135 sensor:
# load the data
df <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vT3ExapTe02141QeOi6aJwFL-WqqW689JExaWsNNtoC3XPt9f5TSQbypNMmj4DA8I1n1VPWYYoR4yiC/pub?gid=557052309&single=true&output=csv")
df$Lat <- as.numeric(df$Lat)
# interpolate between the points i recorded
df$original_point <- 0
df$original_point[is.na(df$Lon)==FALSE] <- 1
row_indices_of_original_points <- which(df$original_point %in% c(1))
df <- interpolate_lat_lon(df, row_indices_of_original_points)
df <-
df %>%
group_by(Segment) %>%
mutate(avg_reading = mean(mq135_sensor_value))
df$row_id <- 1:nrow(df)
df$hm <- df$time
pal <- colorNumeric(
palette = colorRampPalette(c('green', 'orange', 'red'))(length(df$mq135_sensor_value)),
domain = df$mq135_sensor_value)
df %>%
leaflet( options = leafletOptions(zoomControl = TRUE,
zoomSnap = 0.25,
zoomDelta = 1)) %>%
addProviderTiles(providers$CartoDB.Positron) %>% # Add default OpenStreetMap map tiles
addCircleMarkers(lng=~Lon, lat=~Lat,
radius=1,
color=~pal(mq135_sensor_value),
popup = ~as.character(mq135_sensor_value),
opacity=.8)
Here’s the time series of that data:
ggplot() +
geom_line(data = df, aes(x=row_id, y=mq135_sensor_value)) +
theme_minimal() +
xlab("") +
ylab("Sensor value)") +
scale_x_continuous(breaks=seq(1, max(df$row_id), 5*60),
label=df$hm[seq(from=1, to=max(df$row_id), by=5*60)] ) +
ggtitle("Walking home, getting caught in clouds of exhaust")
The code below grabs data from AirNow.gov, which stores historical data from the EPA’s network of air monitoring sites, and summarizes it by year. I do this in a lazy way (just taking the median value of the arithmetic mean column), but you get the point: air quality has gotten better over time. You can re-run this using the “min” or “max” values and the basic results are unchanged: there’s still a downward trend in pollution (yay) but we are still above the WHO standard (boo). Here’s the API documentation: https://aqs.epa.gov/aqsweb/documents/data_api.html#annual
Basically the API call below gets the annual summary data for each year. Then I just take the max value of the “arithmetic mean” column.
Note you’ll need a free account to use this API.
library(jsonlite)
YOUR_EMAIL = readLines('airnow_info.txt')[1]
## Warning in readLines("airnow_info.txt"): incomplete final line found on
## 'airnow_info.txt'
YOUR_API_KEY = readLines('airnow_info.txt')[2]
## Warning in readLines("airnow_info.txt"): incomplete final line found on
## 'airnow_info.txt'
years <- c()
pm25values <- c()
for (yyyy in 2000:2023) {
year <- yyyy
json <- fromJSON(
paste0("https://aqs.epa.gov/data/api/annualData/bySite?email=", YOUR_EMAIL, "&key=", YOUR_API_KEY, "¶m=88101&bdate=", year, "0618&edate=", year, "0618&state=11&county=001&site=0043")
)
df <- json$Data
val <- median(df$arithmetic_mean) # this is kind of lazy but you get the idea
years <- c(years, yyyy)
pm25values <- c(pm25values , val)
print(yyyy)
}
## [1] 2000
## [1] 2001
## [1] 2002
## [1] 2003
## [1] 2004
## [1] 2005
## [1] 2006
## [1] 2007
## [1] 2008
## [1] 2009
## [1] 2010
## [1] 2011
## [1] 2012
## [1] 2013
## [1] 2014
## [1] 2015
## [1] 2016
## [1] 2017
## [1] 2018
## [1] 2019
## [1] 2020
## [1] 2021
## [1] 2022
## [1] 2023
mydata <- data.frame(years, pm25values)
for (ii in 1:nrow(mydata)) {
cat("[", mydata$years[ii], ", ", mydata$pm25values[ii], "],\n")
}
## [ 2000 , 15.3805 ],
## [ 2001 , 15.50391 ],
## [ 2002 , 15.45752 ],
## [ 2003 , 14.35158 ],
## [ 2004 , 14.45549 ],
## [ 2005 , 14.73675 ],
## [ 2006 , 12.95483 ],
## [ 2007 , 13.10266 ],
## [ 2008 , 11.68634 ],
## [ 2009 , 10.24413 ],
## [ 2010 , 10.53363 ],
## [ 2011 , 10.87261 ],
## [ 2012 , 11.67135 ],
## [ 2013 , 9.091899 ],
## [ 2014 , 9.928852 ],
## [ 2015 , 9.491379 ],
## [ 2016 , 8.317021 ],
## [ 2017 , 9.980488 ],
## [ 2018 , 8.661003 ],
## [ 2019 , 8.00084 ],
## [ 2020 , 6.208333 ],
## [ 2021 , 7.661972 ],
## [ 2022 , 6.916949 ],
## [ 2023 , 8.721644 ],
You can see what the graph looks like here: https://codepen.io/pete-rodrigue/pen/xxoNNRj
(I made this graph in highcharts)