A major benefit of life in DC is the vibrant music scene. Generally speaking, ticket prices are reasonable, and the wide offering of venue sizes makes the city an attractive addition to the tour schedules of both big-name artists and those yet to be found. Two summers ago I convinced a friend to head to DC9, one of the District's smaller alternative venues to check out Royal Blood. A year later we barely managed to snag tickets for their sold-out 9:30 Club performance. This experience inspired me to ask some questions about music venues' size, location and pricing which I will explore over the next few posts. I'll go over the findings and graphs first and the coding/procedure at the bottom!
The first and most simple questions any frequent concert goer would ask:
- Where do we listen to music?
- How much does it cost?
To answer these questions I turned to common online ticket retailers in order to procure some concert data. Unfortunately TicketMaster and TicketFly and other primary market ticket sellers make it difficult to learn about event offerings; my web scrapers for those sites were frequently 404'ed. Instead, I turned to secondary ticket market SeatGeek who provide an awesome API and detailed documentation on how to pull down ticket data. Using secondary marketplace data for concert tickets does come with some limitations since scalped tickets are often multiple times more expensive than box office prices. Regardless, the SeatGeek dataset can answer some questions for us:
1. Music venues are geographically clustered. 60% of music venues in the United States are located within 5 miles of another venue. Analysis shows that these "neighborhood clusters" often have a consistent mix of small/medium/large locales, which presumably cater to artists of differing popularity. This clustering trend is apparent in the city-level graphs shown above; certain avenues and neighborhoods contain nearly all of a city's venues. It is likely that venues congregate due to specific sound ordinances in cities, which also has the positive outcome of proximity to bars and clubs.
Even with significant clustering, the effect of sound ordinance on venue location is still a hot topic. Officials in DC are currently considering the extension of stricter residential area sound laws into business/commercial areas. According to the owner of a DC rock club, Black Cat, the move "could effectively shut down D.C.’s live music venues.” A better understanding of the relatively tight clustering of venues could help DC officials recognize the value of conserving more relaxed sound-level laws for "rock-n-roll-avenue".
2. More people = More Music. This might seem like a no-brainer but it's interesting to see the effect of population density on the availability of music venues. To explore this, I mapped each venue in the SeatGeek data set to its respective FIPS code (think ZIP code but better) and pulled in 2010 US Census Bureau population counts.
More populated areas are more likely to have more venues, a trend clearly seen on the All United States map at the start of the post - larger cities are easily accessible to musicians via air and highway and have the greatest concentration of music fans. (The Pearson correlation between population and number of venues within a FIPS area is .82 with a p-value of 0.000.)
Population heavy areas (cities) don't cost more per ticket than less populated areas ('burbs/country). There is no statistical relationship between population size and average ticket price. Concert cost is relatively consistent across venues nationally, likely due to nation wide tours maintaining prices between cities. This observation's extension into the real world is, however, fairly limited by the data set which contains mostly city-based venues. Perhaps with a larger/historic data set we could see just the opposite trend with time series analysis.
Where are the most expensive concerts in the nation? This question is only partially answered by the choropleth map below (think heat map by geographic area). The large grey swatches show that we don't have data for every FIPs county in the US. Even with the missing data, we can see that for our dataset of SeatGeek second hand market tickets, the tip of Florida, New York City and southern California have some of the most expensive tickets. The scale, however, is pretty tight and our above correlation findings showed that these more-populated-areas do not have a super strong relationship.
Without a more robust data set of primary market ticket prices we only have a partial glimpse of the nation's ticket prices, but it does allow us to visualize the geographic distribution.
So how did I make these graphs? I used a mix of Python, R (Get maps and SQLDF libraries primarily), SVG manipulation, Excel and coffee. Back in college I used ARCGIS a bit, but on a tight time frame and 0 budget I went the freeware route. I did try some freeware alternatives to ARCGIS but they were slow and clunky.
Ultimately the process was as follows. I am more than happy to share code and data if requested!
- Use Python to pull all events from the SeatGeek API - push to R
- Import/Stack/Munge events data. Summarize individual events to venue level. Explore correlations and plot points for specific cities and US in total. Use "noncensus" library, to crosswalk Zip codes to FIPs codes and pull in 2010 census population values. Summarize average ticket price by FIPs code and export to txt. Other packages used: ggmap, ggplot2, sqldf, rcmdr, hmisc.
- Use Nathan Yau's method from Flowing Data and Python to edit a FIPS code SVG (scalable vector graphic) document from Wikipedia. Import SVG to Adobe Illustrator and spruce it up a bit.
- Listened to lots of Mo Lowda and the Humble during the post creation. This band is like a jazzy Kings of Leon from Temple University and they absolutely rock out.