Back in 2011, I happened across a Rolling Stone article about Britney Spears’ album, Femme Fatale. This was the first album since her public meltdown in 2007 and in the question-answer of the article it was clear to me that Britney was poising herself for a big 2011 comeback.
The biggest details that stuck out to me were the producer names she dropped: will.i.am, Dr. Luke and Max Martin. I had heard of will.i.am from his days in The Black Eyed Peas, but the others were unfamiliar, so I read up on their work and was absolutely blown away.
Max Martin himself had a hand in nearly every memorable song from my adolescence through college years: Backstreet Boys’ “I Want It That Way” (1999), Britney’s “…Baby One More Time” (1999) and Bon Jovi’s “It’s My Life” (2000) and even Katy Perry’s “I Kissed a Girl” (2008), Maroon 5’s “One More Night” (2012), Taylor Swift’s “Blank Space” (2014) and The Weeknd’s “Can’t Feel My Face” (2015).
Who the HELL is Max Martin?
With eyes open to the behind-the-scened world of pop songwriting and production, I began to see the same names over and over. In particular, Dr. Luke was reportedly holding my personal favourite pop star, Ke$ha as a contract hostage as she had accused him of assult. Clearly these producers/songwriters have a huge and highly personal hand in making pop hits big and I wanted to know how this world operated at scale.
The central question I answer in this post is: “Who Writes the Hits?”
I’m not interested in any artist who received public credit for their role in the song. For example, Taylor Swift received a songwriting credit for 2014’s Blank Space along with Max Martin and “Shellback” but only Taylor was listed on the Billboard charts as the “artist”. In the forthcoming analysis I try to assemble a list of producers/writers/composers on a song and then remove any name publicly associated with the track (including “ft.”, “feat.”, “v” etc). I coined the phrase “Billboard Ghosts” to describe the remaining persons with a semi-private hand in the creation of the biggest pop hits.
To the right is a screenshot from Taylor Swift's 1989 digital booklet; you can clearly see the "written by" chunks list the individuals who collaborated to bring each track to life. In this post we want to filter out the public names to uncover the ghosts behind the scenes.
I set about collecting a dataset that listed the credits of every song that surpassed rank 50 of the Billboard Hot 100 lists. Surprisingly, this data is not available in a readily downloadable and tidy format. First, I downloaded every Billboard Hot 100 track from 2007-2017 (same process as my (hyperlink)[last blog post]).
Based on my findings in the last blog post, I knew I wanted to focus on the "cream of the crop" of Billboard Hot 100 tracks rather than throwing in all of them and the kitchen sink. I isolated the song title and artist name for any track that rose least to rank 50 during its lifetime.
Next, I needed the credits for each song. There are some services out there like Music Reports that advertise song credit databases for muy dinero, or I could have tried to knit together the Song Rights repositories from giant record labels like BMI, WBRecords, Capital Records but there would be a lot of ambiguity and overlap.
Instead I tried to use open-source and public information. The crowd-sourced site DISCOGs maintains credits information for albums and tracks, but I found the data to be of mediocre quality and the 1-request-per second throttling to be absurd (I needed to make on average 4 API requests per track which would have taken forever).
Ultimately, I turned to Wikipedia. I wrote a scraper in Python that:
- Searched for each track’s Wikipedia page
- Downloaded the “InfoBox” quick information (see the above picture for Blank Space’s InfoBox)
- Stored this table in a semi-structured dictionary
- Searched for any instance of “producer”, “composer”, “writer” in the infobox and accumulated a distinct list of names.
- Used a simple and rough crosswalk from DISCOGs to translate pseudonyms and abbreviations to full names.
- Filtered out the “public names” to isolate the “ghosts” on a track.
- Pinged the Spotify API for each track’s UID and automatically generated a public playlist of each ghost’s hits.
- Some weird versions (a la Glee) snuck in here but overall it was fine with me.
How it Works
One way of visualizing the world of Billboard Ghosts is considering producer collaborations and overall influence. In this sense, the dataset I generated by combining the Hot-100 list, Wikipedia data, DISCOGs and Spotify is a network of pop influencers.
I generated an interactive graphic that conveys this network:
- Each node is a “ghost producer”:
- Size and color indicates the number of “hits” the ghost is associated with
- Each connection is a collaboration:
- Line thickness and color indicates the number of collaborations those two ghosts have.
- Layout was chosen in using a node placement algorithm:
- Because there is no dependent relationship between any ghosts, this network is undirected.
- I used the Force Atlas algorithm in Gephi to determine where each node should be placed.
- The infobox on the right gives some basic information about the Ghost and has a Spotify playlist of some of their hits that you can follow!
Some interesting things I’ve noticed are the slight separation of “country”, “pop”, “hip hop” and “house/techno” communities. I also like how some communities only made the cut based on a few tracks.
There are some peculiar finds too: some ghosts only appear on the graphic after lawsuits mandated royalty sharing or labels listed heavy influences on the credits to avoid lawsuits (see: Kid Rock’s “All Summer Long”).
Besides the big names, I learned of “Starrah” who - since 2015 - has worked with Rihanna, Nicki Minaj, Drake, Lil Wayne, Calvin Harris and Katy Perry. Her playlist is definitely worth checking out!
FYI: The viz can be zoomed, hovered, clicked, filtered etc!
- Each dot represents a "hidden" producer/writer/composer who was listed on a track's credits but not publicly.
- For example, Taylor Swift would only have tracks she produced without being listed as the artist, featuring, starring etc.
- By hovering over each "ghost" you can see the other "ghosts" with whom they frequently collaborate.
- When a ghost is clicked, the names of their collaborators appear and a playlist of some of their hits appears. Unclick to resume exploring!
- Scroll to zoom on a node to view a ghost's connections with more detail!
There are some data issues in the viz; for example, “Beyonce Knowles” is still listed as a ghost for “Beyonce” tracks and “Adam Levine” gets producer credits for “Maroon 5” tracks and “Chris Brown” is recorded as “Christoper Brown”.
As always, the munging/cleaning/parsing process of this analysis took forever and - while I fixed a ton of issues - some definitely slipped through the cracks. I may go back with a more manual review in the future but I think that for the most part this viz provides an interesting insight without perfect data.
A few posts ago I presented some network graphs showing how genre could be inferred from “related artist” connections on Spotify. After I posted the graphics on Reddit I received a lot of feedback asking for more interactivity (search, hover, infobox etc.).
Finally: shoutout to Spotify for making it super easy to search their API, create playlists and embed media players on the fly. As always, the code is available via the GitHub link up top and drop a question/feedback below!