This Rmarkdown document summarizes the methodology used for this small survey of shopkeepers and shoppers on Columbia Road between Mintwood Place & Biltmore Street. The document shows all of the data analysis conducted.

The map above shows the block on which the survey was done.



Methodology

I surveyed roughly 130 people on the sidewalk and 12 “shopkeepers” (both staff in stores and in restaurants) over several days between July 14 2024 and July 27 2024.

The specific methodology changed slightly over time (I realized I needed to ask a few additional questions as I conducted the survey), so I show the results here broken out a number of ways. These tweaks didn’t change the top-line results. If you omit the first day of surveying, for example, or only count people who self-reported as regular shoppers on the last day, the main result (that people in general over-estimate the prevalence of driving and under-estimate walking) remains unchanged.

So what was the methodology?

I did two surveys: one of staff in stores+restaurants, and one of people walking by on the street. I only surveyed people and businesses on the NW side of Columbia Rd between Mintwood and Biltmore (with one exception, Julia’s Empanadas, which I omit from all of the analysis, since it’s actually on 18th St. If you include the data from Julia’s, the results don’t change.)

You can find the survey document here: https://drive.google.com/drive/folders/1iljrrzEF3Ax8WQACVvPGmcji2t2A8Z3V?usp=sharing

That folder also has all of the raw data.

The shopkeeper survey questions were,

  1. How would you estimate the percentage of each mode of transportation that your customers use to get to your business?
  2. How do you get to work?

The passerby questions were,

  1. How would you estimate the percentage of each mode of transportation that people use to get to this strip?
  2. How did you get here today?
  3. How many times a week do you shop on this strip?

On the first day of surveying, I only stopped people going into or out of So’s Your Mom. Originally, I was thinking of doing this survey on a store-by-store basis. But I quickly realized this methodology would take forever to implement; there just aren’t that many people going into a single individual store in any given hour, relative to the total number of people passing by on the street.

So on day two of surveying, I asked everyone who passed by if they’d take the survey.

After day two of surveying, I was worried that I wouldn’t be able to tell, based on the data I was collecting, who actually shopped at these stores, and who was just passing through.

So on day 3 I started asking people “do you actually shop on this strip?” Finally, on day 4, I asked a better, more specific version of this question: “how many times a month do you shop on this strip?”

I should’ve been asking that question from the beginning! But live and learn. For what it’s worth, the main result actually gets more pronounced if you only use the data from this last day of surveying, looking only at people that report being regular shoppers on the strip. (See analysis below).

What can we reasonably infer from this?

This is a relatively small study, so the results should be interpreted with caution, especially viz a viz the smaller subgroups. The main result seems to be that, on average, both passersby, regular shoppers, and staff at businesses tend to over-estimate how much people drive to visit these businesses, and under-estimate walking, transit use, and biking/scootering.

That said! These are averages. Some individual staff at businesses gave estimates that were actually pretty close to the average. And it’s possible that the staff at businesses that gave lower walking estimates are actually correct for their specific business (it might really be the case that more people drive to their business).

(It would be super interesting if any of these businesses collected a little data on how people get to them!)

Otherwise, staff appear more likely to live farther away and drive to the business. People also seem to have a slightly tendency to favor their own mode of transportation when estimating the average shares.

Other caveats

I don’t have hard data on this, but I wonder if drivers/car-users were slightly under-sampled; it seemed like they were less likely to walk down the block to their destination (often a restaurant) and more likely to hop out of an uber directly in front of it. This could’ve made them more likely to decline the survey (“sorry my reservation is for right now”) and less likely to pass my survey station (on the days I wasn’t posted directly in front of the restaurant, for example). That said, I was posted directly in front of Perry’s on day 4, a Friday evening, and the reported % of drivers was still only 12%, well below the 20-25% that passersby and shopkeepers estimated. And for that matter, you could use this same logic to assert that the numbers of cylists & scooter-ers are under-counted, since they were impossible to survey while biking/scootering, and they often lock-up right in front of their destination, which may be down the block from my survey station.

Finally, one should interpret the mode of transport shares carefully in general. It would be easy to look at these results and think “huh well I guess no one bikes or scooters” and then jump to “I guess no one wants to use those modes of transit.” But that would be falling victim to the survivorship fallacy. If there were better, safer biking infrastructure, more people would bike (there have been umpteen studies on this, ex: https://www.ncbi.nlm.nih.gov/search/research-news/13155/#.)

Of course the same could be said for driving! (If you turned Kalorama park into a parking lot, more people might drive to Lapis). Personally I don’t think increasing parking is a good idea, but you get the point: current travel patterns don’t necessarily reflect what we’d want in a better/ideal world.

If you have any questions about the survey methodology, I’d be happy to chat. I am a mega dork who enjoys this stuff. You can reach me at edwardpierrerodrigue at gmail dot com

Finally–huge thanks to everyone who participated in the survey! It was really cool to talk to you all!

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.width=18, fig.height=10) 

# load the packages we need
library(tidyverse)
library(ggplot2)

# Load the potential shopper data ("psd") from google drive. 
# You can download the original potential shopper survey data as a CSV file using the link below.
psd <- read.csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vQrLc_Jw_iuH2UPN01VHxnGnrz8tIRZrrgSAxCxtN9JPu0FzNh-a7i6g14QKxfLkBC0cOt2N4b-zC0t/pub?gid=1149018298&single=true&output=csv')
## Now we'll add shopkeeper survey info (skd='shopkeeper data'):
skd <- read.csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vQrLc_Jw_iuH2UPN01VHxnGnrz8tIRZrrgSAxCxtN9JPu0FzNh-a7i6g14QKxfLkBC0cOt2N4b-zC0t/pub?gid=0&single=true&output=csv')

# You can also find all of the data for shopkeepers and potential shoppers here: 
# https://drive.google.com/drive/folders/1iljrrzEF3Ax8WQACVvPGmcji2t2A8Z3V?usp=sharing
# first, remove the response from the cool guy that works at Julia's Empanadas. 
# I'd talked to him originally thinking the survey would cover more of the Adams Morgan / Columbia Rd area
# but it was a fair amount of work just focusing on the strip of Columbia between Mintwood and Biltmore
# so ultimately kept the survey focused on that small but defined area.
# (he estimated 80% of customers drive and 20% walk, so if you include his responses in the analysis, it just makes the main results more pronounced)
skd <- skd[skd$store != "Julias Empanadas",]

cat("\n\nBelow are some basic summary stats\n\n")
## 
## 
## Below are some basic summary stats
# Total number of hours spent surveying potential shoppers:
cat('Approximate total time spend surveying shoppers/passersby: ', sum(psd$total_time[psd$response_num==1]), 'hours.')
## Approximate total time spend surveying shoppers/passersby:  9.38 hours.
# Total number of responses:
cat('\nTotal number of shopper survey responses: ', nrow(psd))
## 
## Total number of shopper survey responses:  130
# Total number of responses, omitting the first day:
cat('\nTotal number of shopper responses, omitting the first day: ', nrow(psd[psd$date != 'July 14 2024',]))
## 
## Total number of shopper responses, omitting the first day:  100
# Total number of non-responses:
nonresponses <- as.numeric(as.character(unlist(
    psd %>% group_by(date) %>% summarise(nonresponses = max(num_non_response)) %>% summarize(sum = sum(nonresponses)))
    ))
cat('\nTotal number of shoppers that declined the survey: ', nonresponses)
## 
## Total number of shoppers that declined the survey:  248
# Total nonresponse rate:
cat('\nTotal nonresponse rate: ', round(nonresponses / (nrow(psd) + nonresponses) * 100, 0), "%\n", 
    'This is actually not a bad response rate!\n')
## 
## Total nonresponse rate:  66 %
##  This is actually not a bad response rate!
# Average responses per hour:
cat('\nAverage number of shopper responses per hour: ', round(nrow(psd) / sum(psd$total_time[psd$response_num==1]), 1))
## 
## Average number of shopper responses per hour:  13.9
# Total number of shopkeeper responses:
cat('\nTotal number of shopkeeper responses: ', nrow(skd))
## 
## Total number of shopkeeper responses:  12
## Clean up so NAs are zero in the perception columns
## (If someone didn't list a particular mode, that should count as "zero")
for (col_name in c("car.estimate..includes.uber.", "transit..bus.or.metro..estimate",
     "walk.estimate", "bike.estimate", "other.estimate",
     "did.not.specify.if.did.not.provide...that.sum.to.100.")) {
  for (ii in 1:nrow(psd)) {
    if ( is.na(psd[[col_name]][ii]) ) {
      psd[[col_name]][ii] <- 0
    }
  }
}

## group together biking and scootering. there was only one scooter respondent
# and they're similar modes and are both done in bike lanes
psd$persons_method_of_travel[psd$persons_method_of_travel == 'scooter'] <- 'bike'
psd$persons_method_of_travel[psd$persons_method_of_travel == 'bike'] <- 'bike or scooter'

# group together biking and scootering in the perceptions columns: 
psd$bike.estimate[psd$Notes %in% c('"other" means scooter', 'Other was "scooter"',
                                 'This person said "scooter" for the "other" category')] <- 
  psd$bike.estimate[psd$Notes %in% c('"other" means scooter', 'Other was "scooter"',
                                   'This person said "scooter" for the "other" category')] +
  psd$other.estimate[psd$Notes %in% c('"other" means scooter', 'Other was "scooter"',
                                   'This person said "scooter" for the "other" category')]

names(psd)[names(psd) == 'bike.estimate'] <- 'bike.or.scooter.estimate'

# remove those scooter values from the other category so we don't double count:
psd$other.estimate[psd$Notes %in% c('"other" means scooter', 'Other was "scooter"',
                                  'This person said "scooter" for the "other" category')] <- 0

The code below charts the data and shows how the results wiggle around if you subset the data in different ways. The main results don’t change.

Plot showing how shopkeepers get to work:

skd %>%
  group_by(how_respondent_gets_to_work) %>%
  mutate(count = n(),
         group_index=row_number()) %>%
  filter(group_index == 1) %>%
  ungroup() %>%
  mutate(percent = count / sum(count) * 100,
         how_respondent_gets_to_work = ifelse(test=how_respondent_gets_to_work=='scooter', 
                                              yes='bike or scooter', 
                                              no=how_respondent_gets_to_work)) %>%
  ggplot() +
  geom_bar(aes(x=how_respondent_gets_to_work, y=percent), stat='identity', fill='#109c37') +
  ggtitle('58% of shopkeepers interviewed drive to work') +
  xlab('How respondent gets to work') +
  labs(caption = paste0("N=", nrow(skd))) +
  theme_minimal() +
  theme(plot.title = element_text(size = 24, face = "bold"),
        axis.title = element_text(size=20),
        axis.text = element_text(size=20),
        plot.caption = element_text(size=20, hjust=0)) +
  scale_y_continuous(breaks = seq(0, 70, 10)) 

Plot showing how passersby were travelling, including all data:

plot_potential_shopper_mot <- function(df, title) {
  rv1 <-
    df %>%
  group_by(persons_method_of_travel) %>%
  mutate(count = n(),
         group_index=row_number()) %>%
  filter(group_index==1) %>%
  ungroup() %>%
  mutate(percent = count / sum(count) * 100) 
  
  rv2 <-
    rv1 %>%
  ggplot() +
  geom_bar(aes(x=persons_method_of_travel, y=percent), stat='identity', fill='#109c37') +
  ggtitle(title) +
  xlab('How shopper travelled') +
  theme_minimal() +
  theme(plot.title = element_text(size = 24, face = "bold"),
        axis.title = element_text(size=20),
        axis.text = element_text(size=20),
        plot.caption = element_text(size=20, hjust=0)) +
  scale_y_continuous(breaks = seq(0, max(rv1$percent)+10, 10)) +
  labs(caption = paste0("N=", nrow(df)))
  
  rv1 <- rv1 %>% select(persons_method_of_travel, count, percent)
  return(list(rv1, rv2))
}

rv = plot_potential_shopper_mot(psd, title='75% of shoppers arrived on foot')
table = rv[[1]]
chart = rv[[2]]
table
## # A tibble: 4 × 3
##   persons_method_of_travel count percent
##   <chr>                    <int>   <dbl>
## 1 walk                        98   75.4 
## 2 bike or scooter              8    6.15
## 3 car                         11    8.46
## 4 transit                     13   10
chart