air pollution

The purpose of this Rmarkdown doc is to show the work I did for a GGWash post about air pollution.

If you notice errors, have questions, or want to team up to make a street car-free, please email me at edwardpierrerodrigue [at] gmail [dot] com.

First, here’s a bit about the hardware I used. My set up was super cheap & required almost no coding skills. Anyone can get this working!

Below is what the particulate matter sensor looks like. It’s just a raspberry pi pico plugged into a particulate matter sensor “hat” from sb components. (To be clear: I was not paid, compensated, whatever, by any of these companies to hawk their stuff; these are just the components I ended up using. You can probably find similar setups that work just as well.) The pi microcontroller runs about $10. The sensor costs about $50.

The people that make the sensor have some good example code on github that’ll get your sensor working in a few minutes. I tweaked that code slightly to save the readings into a text file.

The raspberry pi pico just gets plugged into the “hat”, then you plug the pico into your laptop, run the code, and start getting readings. A laser in the sensor basically just counts how many particles pass through its beam, giving you an estimate of the “parts per million” (ppm) of particles in the air.

$Here’s my particulate matter sensor set up.$

Here’s my particulate matter sensor set up.

Below is my general pollution-sniffing sensor. I was feeling super cheap after blowing a few hundred dollars on truck parts, trying & failing to build a diy NOx sensor, so this build runs about $20: $10 for the sensor and $10 for the off-brand arduino uno R3 microcontroller. The sensor is called an MQ-135 sensor. It’s a clever little device that changes the resistance of the electrical circuit it’s plugged into, depending on how much pollution (e.g. molecules other than oxygen) are interacting with it’s surface (made of tin dioxide). You can read more about the science here.

Like the above setup, you just wire up the MQ-135 sensor to the microcontroller (here, an arduino board, rather than a raspberry pi board), plug the microcontroller board into your laptop, upload the code, and hit run.

Couple notes:

With this sensor, you need to run it for 24 hours before first use. Just leave it plugged into your laptop running. It’ll heat up a little; that’s fine. You should see the readings drop over time.
Don’t put too much weight into the sensor’s absolute values. These cheap sensors can really only give use useful relative readings (e.g. where / in what circumstances to the readings go up?).

Here’s my general pollution-sniffing sensor.

If you have any questions, feel free to ping me; my email is above!

Anyway, here’s my data analysis.

First, the particulate data:

Map the particulate matter data I collected:

pal <- colorNumeric(
  palette = colorRampPalette(c('green', 'orange', 'red'))(length(df$average_pm25)),
  domain = df$average_pm25)
df %>%
  leaflet(        options = leafletOptions(zoomControl = TRUE,
                                 zoomSnap = 0.25,
                                 zoomDelta = 1)) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%  # Add default OpenStreetMap map tiles
  addCircleMarkers(lng=~Lon, lat=~Lat,
                   radius=1,
                   color=~pal(average_pm25),
                   popup=~as.character(average_pm25),
                   opacity=.8)

Plot the data as a time series (for the GGWash post, I ended up just using google sheets to make the graph).

below_axis_val = -18
below_axis_text_val = -23


p <-
  ggplot() +
  geom_line(data = df, aes(x=row_id, y=pm25)) +
  theme_minimal() +
  xlab("") +
  ylab("PM 2.5 reading (parts per million)") +
  scale_x_continuous(breaks=seq(1, max(df$row_id), 5*60),
                     label=df$hm[seq(from=1, to=max(df$row_id), by=5*60)] ) +
  ggtitle("Walking home, getting caught in clouds of exhaust") +
  coord_cartesian(ylim = c(0, max(df$pm25)), clip="off") +
  
      geom_segment(aes(x = min(df$row_id),
                   y = below_axis_val,
                   xend = max(df$row_id[df$Segment == "In my office building"]),
                   yend = below_axis_val),
                size=1
                ) +
  annotate("text", 
           x = min(df$row_id)+250, 
           y = below_axis_text_val, label = "In my office") +
  
        geom_segment(aes(x = max(df$row_id[df$Segment == "In my office building"])+50,
                   y = below_axis_val,
                   xend = max(df$row_id[df$Segment == "12 & Q"]),
                   yend = below_axis_val),
                size=1
                ) +
  annotate("text", 
           x = max(df$row_id[df$Segment == "In my office building"])+600, 
           y = below_axis_text_val, label = "Walking north on 12th") +
  
   geom_segment(aes(x = max(df$row_id[df$Segment == "12 & Q"])+50,
                   y = below_axis_val,
                   xend = max(df$row_id[df$Segment == "Malcolm X Park"]),
                   yend = below_axis_val),
                size=1
                ) +
  annotate("text", 
           x = max(df$row_id[df$Segment == "12 & Q"])+400, 
           y = below_axis_text_val, label = "Logan Cir / U st") +
  
  geom_segment(aes(x = max(df$row_id[df$Segment == "Malcolm X Park"])+50,
                   y = below_axis_val,
                   xend = max(df$row_id[df$Segment == "In apt"]),
                   yend = below_axis_val),
                size=1
                ) +
  annotate("text", 
           x = max(df$row_id[df$Segment == "Malcolm X Park"])+350, 
           y = below_axis_text_val, label = "Adams Morgan") +

  
   annotate(
        'curve',
        x = 1750,
        y = 109,
        yend = 110,
        xend = 1011+100,
        linewidth = 1,
        curvature = -0.2,
        arrow = arrow(length = unit(0.5, 'cm')),
        color='darkgrey'
      ) +
  
    annotate(
        'curve',
        x = 2864-650,
        y = 108,
        yend = 100,
        xend = 2864-120,
        linewidth = 1,
        curvature = 0.2,
        arrow = arrow(length = unit(0.5, 'cm')),
        color='darkgrey'
      ) +
 
  annotate(
        'curve',
        x = 2009,
        y = 101,
        yend = 91,
        xend = 2009,
        linewidth = 1,
        curvature = 0,
        arrow = arrow(length = unit(0.5, 'cm')),
        color='darkgrey'
      ) +
      annotate("text", x = 2000, y = 109,
               size=4,
               label = "Plumes of exhaust\nfrom vehicles",
               color='darkgrey') +  
  
  annotate(
        'curve',
        x = 3128+100,
        y = 95,
        yend = 85,
        xend = 3128+10,
        linewidth = 1,
        curvature = -0.35,
        arrow = arrow(length = unit(0.5, 'cm')),
        color='darkgrey'
      ) +
      annotate("text", x = 3128+200, y = 103,
               size=4,
               label = "Holding the sensor inches\nfrom moped exhaust pipe",
               color='darkgrey') +
  
  
  theme(plot.margin = unit(c(2,1,2,0), "lines"))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Here’s the data from the MQ-135 sensor:

# load the data 
df <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vT3ExapTe02141QeOi6aJwFL-WqqW689JExaWsNNtoC3XPt9f5TSQbypNMmj4DA8I1n1VPWYYoR4yiC/pub?gid=557052309&single=true&output=csv")

df$Lat <- as.numeric(df$Lat)

# interpolate between the points i recorded
df$original_point <- 0
df$original_point[is.na(df$Lon)==FALSE] <- 1
row_indices_of_original_points <- which(df$original_point %in% c(1))

df <- interpolate_lat_lon(df, row_indices_of_original_points)

df <-
  df %>%
  group_by(Segment) %>%
  mutate(avg_reading  = mean(mq135_sensor_value))

df$row_id <- 1:nrow(df)
df$hm <- df$time

pal <- colorNumeric(
  palette = colorRampPalette(c('green', 'orange', 'red'))(length(df$mq135_sensor_value)),
  domain = df$mq135_sensor_value)
df %>%
  leaflet(        options = leafletOptions(zoomControl = TRUE,
                                 zoomSnap = 0.25,
                                 zoomDelta = 1)) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%  # Add default OpenStreetMap map tiles
  addCircleMarkers(lng=~Lon, lat=~Lat,
                   radius=1,
                   color=~pal(mq135_sensor_value),
                   popup = ~as.character(mq135_sensor_value),
                   opacity=.8)

Here’s the time series of that data:

  ggplot() +
  geom_line(data = df, aes(x=row_id, y=mq135_sensor_value)) +
  theme_minimal() +
  xlab("") +
  ylab("Sensor value)") +
  scale_x_continuous(breaks=seq(1, max(df$row_id), 5*60),
                     label=df$hm[seq(from=1, to=max(df$row_id), by=5*60)] ) +
  ggtitle("Walking home, getting caught in clouds of exhaust")

The code below grabs data from AirNow.gov, which stores historical data from the EPA’s network of air monitoring sites, and summarizes it by year. I do this in a lazy way (just taking the median value of the arithmetic mean column), but you get the point: air quality has gotten better over time. You can re-run this using the “min” or “max” values and the basic results are unchanged: there’s still a downward trend in pollution (yay) but we are still above the WHO standard (boo). Here’s the API documentation: https://aqs.epa.gov/aqsweb/documents/data_api.html#annual

Basically the API call below gets the annual summary data for each year. Then I just take the max value of the “arithmetic mean” column.

Note you’ll need a free account to use this API.

library(jsonlite)

YOUR_EMAIL = readLines('airnow_info.txt')[1]

## Warning in readLines("airnow_info.txt"): incomplete final line found on
## 'airnow_info.txt'

YOUR_API_KEY = readLines('airnow_info.txt')[2]

## Warning in readLines("airnow_info.txt"): incomplete final line found on
## 'airnow_info.txt'

years <- c()
pm25values <- c()

for (yyyy in 2000:2023) {
  year <- yyyy
  json <- fromJSON(
    paste0("https://aqs.epa.gov/data/api/annualData/bySite?email=", YOUR_EMAIL, "&key=", YOUR_API_KEY, "&param=88101&bdate=", year, "0618&edate=", year, "0618&state=11&county=001&site=0043")
  )
  
  df <- json$Data
  val <- median(df$arithmetic_mean)  # this is kind of lazy but you get the idea
  
  years <- c(years, yyyy)
  pm25values <- c(pm25values , val)
  
  print(yyyy)
  
}

## [1] 2000
## [1] 2001
## [1] 2002
## [1] 2003
## [1] 2004
## [1] 2005
## [1] 2006
## [1] 2007
## [1] 2008
## [1] 2009
## [1] 2010
## [1] 2011
## [1] 2012
## [1] 2013
## [1] 2014
## [1] 2015
## [1] 2016
## [1] 2017
## [1] 2018
## [1] 2019
## [1] 2020
## [1] 2021
## [1] 2022
## [1] 2023

mydata <- data.frame(years, pm25values)

for (ii in 1:nrow(mydata)) {
  cat("[", mydata$years[ii], ", ", mydata$pm25values[ii], "],\n")
}

## [ 2000 ,  15.3805 ],
## [ 2001 ,  15.50391 ],
## [ 2002 ,  15.45752 ],
## [ 2003 ,  14.35158 ],
## [ 2004 ,  14.45549 ],
## [ 2005 ,  14.73675 ],
## [ 2006 ,  12.95483 ],
## [ 2007 ,  13.10266 ],
## [ 2008 ,  11.68634 ],
## [ 2009 ,  10.24413 ],
## [ 2010 ,  10.53363 ],
## [ 2011 ,  10.87261 ],
## [ 2012 ,  11.67135 ],
## [ 2013 ,  9.091899 ],
## [ 2014 ,  9.928852 ],
## [ 2015 ,  9.491379 ],
## [ 2016 ,  8.317021 ],
## [ 2017 ,  9.980488 ],
## [ 2018 ,  8.661003 ],
## [ 2019 ,  8.00084 ],
## [ 2020 ,  6.208333 ],
## [ 2021 ,  7.661972 ],
## [ 2022 ,  6.916949 ],
## [ 2023 ,  8.721644 ],

You can see what the graph looks like here: https://codepen.io/pete-rodrigue/pen/xxoNNRj

(I made this graph in highcharts)

air pollution

pete rodrigue

2024-07-25