I think I first discovered Tool in middle school, when I borrowed my dad’s CDs to listen to on the bus ride to school. This was right when Lateralus came out, and I remember listening to “Schism” over and over and over, staring out of the dingy bus window every morning. Ever since then, I’ve been a huge fan – not just of Tool, but also of Maynard James Keenan’s other bands: A Perfect Circle and Puscifer. Plus also just of him as a person, in the least stalkery way possible.

Not only is each groups’ music incredible, but they all bring something different, and each has evolved over time. Contemporary Tool brings social commentary and reflections on epistemology layered over heavy bass, soaring guitar, and mind-boggling drumming. Puscifer serves up ironic industrial rock songs about intercourse with country music legends (the group occasionally broaches other topics, too). And APC offers a softer (maybe?), more melodic contrast to Tool. But suffice to say that I’m a big fan of all of these projects, and that the point of this post isn’t to review anything or offer some new critical take on Maynard’s oeuvre.

Anyway, within the past few years, I’ve also become a big fan of data analysis and visualization. And I recently learned about the {spotifyr} package for R, which let’s you pull in tons of data from Spotify to analyze. So, I figured this would be a pretty good chance to play around with some analysis and visualization while exploring some of my favorite bands.

I’m including all of my code here for those interested. One other note is that, if you want to do some of your own analysis or replicate this one, you’ll need to create a developer account with Spotify. The Readme file for the {spotifyr} package (linked above) includes instructions on how to do this.

Anyway, let’s get to it. First, let’s load the packages we’ll need. With the exception of {hrbrthemes} and {ggtext}, all of these are available through CRAN.

knitr::opts_chunk$set(echo = TRUE, warning = FALSE)


## -- Attaching packages ------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(hrbrthemes) #install.packages("hrbrthemes", repos = "")
## Registering fonts with R
library(ggtext) #devtools::install_github("clauswilke/ggtext")

## You will likely need to install these fonts on your system as well.
## You can find them in [C:/Users/erice/Documents/R/win-library/3.6/hrbrthemes/fonts/roboto-condensed]

Next, let’s get all of the info for Maynard’s main projects: Tool, A Perfect Circle, and Puscifer.

bands <- c("tool", "a perfect circle", "puscifer")

maynard_raw <- bands %>%
  map(~get_discography(artist = .)) %>%
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
## Joining, by = c("track_title", "track_n", "track_url")
#it looks like we have some duplicate tracks plus some remixes, so let's get rid of those
maynard <- maynard_raw %>%
  ungroup() %>%
  mutate_if(is.character, ~str_to_lower(.)) %>% #making everything lowercase so I don't have to deal with capitalization
  filter(str_detect(track_name, "mix", negate = TRUE),
         str_detect(album_name, "mix|load|amotion", negate = TRUE)) %>%
  distinct(track_name, .keep_all = TRUE) %>%
  mutate(duration_s = duration_ms/1000,
         artist_name = str_replace_all(artist_name, c("tool" = "Tool", "a perfect circle" = "APC", "puscifer" = "Puscifer")))
#changing artist name back to title case so it looks better in plots

Spotify provides several metrics for each track (for more, see this documentation). The ones I’m interested in here are:

  • Energy: " Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy."

  • Loudness: “The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks.”

  • Valence: “A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).”

  • Tempo: “The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.”

  • Duration: “The duration of the track in milliseconds.” We converted this to seconds above to make it more intuitive.

Track Duration

As an initial analysis, let’s look at track duration across different artists and albums.

#this will get us an ordering of the album names just in case we need to use it later
album_order <- maynard %>%
  distinct(album_name) %>%

ggplot(maynard, aes(x = fct_relevel(album_name, album_order), y = duration_s, color = artist_name)) +
  geom_point(alpha = .8) +
  scale_color_ft() +
  scale_fill_ft() +
  coord_flip() +
  labs(title = "Length of Songs by Album",
       y = "Length in Seconds",
       x = "")

Let’s make a boxplot to get a sense of the distributions here.

ggplot(maynard, aes(x = fct_relevel(album_name, album_order), y = duration_s, color = artist_name)) +
  geom_boxplot(alpha = 0) +
  scale_color_ft() +
  scale_fill_ft() +
  coord_flip() +
  labs(title = "Length of Songs by Album",
       y = "Length in Seconds",
       x = "")

From this, we can see that Tool songs are, on average, longer than APC or Puscifer songs, and that Tool songs tend to be getting longer in each subsequent album (note that albums are arranged chronologically by release within each artist). This isn’t really a surprise, since it feels like almost every song on Fear Inoculum is 10+ minutes, minus the little interludes. We can also see, though, between these last two plots, that there’s a lot of variance in song length on Fear Inoculum in comparison to every other album.


Let’s see if loudness varies across Maynard’s groups and albums. My guess is that Puscifer, an industrial & almost dance-y rock group, will have the loudest songs, whereas APC is more known for melodic things and will be the quietest. Tool songs will probably also have the flattest distributions. One thing to remember is that the data here gives average loudness per song, so it won’t really capture all of the intra-song changes (which are a hallmark for Tool in particular).

ggplot(maynard, aes(x = loudness, y = fct_relevel(album_name, album_order), fill = artist_name, color = artist_name)) +
  geom_density_ridges(alpha = .9) +
  scale_color_ft() +
  scale_fill_ft() +
    title = "Distribution of Song Loudness by Album",
    y = "",
    x = "Loudness (dB)"
## Picking joint bandwidth of 1.72

So, we definitely see more variation in the loudness of Tool songs, particularly on the later albums, but it looks like the APC album Eat the Elephant is probably the loudest. I wouldn’t have guessed that. There are definitley some louder songs (e.g “Get the Lead Out,” “The Doomed”), but others are quieter or take time to build. Loudness of Puscifer songs seems to be more normally distributed (by album) than either APC or Tool songs.


Ok, so let’s see if we see similar distributions for energy.

ggplot(maynard, aes(x = energy, y = fct_relevel(album_name, album_order), fill = artist_name, color = artist_name)) +
  geom_density_ridges(alpha = .9) +
  scale_color_ft() +
  scale_fill_ft() +
    title = "Distribution of Song Energy by Album",
    y = "",
    x = "Energy"
## Picking joint bandwidth of 0.0976

Sort of? The distributions for Tool albums look somewhat similar – keeping in mind that the metrics are on different scales – as do the APC distributions. The distributions for energy in Puscifer albums appear somewhat flatter, though. Maybe looking at the correlations between these will be helpful.

ggplot(maynard, aes(x = loudness, y = energy)) +
  geom_point(aes(color = artist_name)) +
  scale_color_ft() +
    title = "Relationship between Song Loudness and Energy"

It looks like there’s a strong relationship here, but the bigger issue is that all of the songs cluster around the same level of loudness, with a few exceptions from Tool (and a smaller number of APC exceptions), so it’s a little bit hard to tell.

Just to check, we can take a look at the estimated bivariate correlation.$energy, maynard$loudness), plot = FALSE)
## Call:corCi(x = x, keys = keys, n.iter = n.iter, p = p, overlap = overlap, 
##     poly = poly, method = method, plot = plot, minlength = minlength, 
##     n = n)
##  Coefficients and bootstrapped confidence intervals 
##    C1   C2  
## R1 1.00     
## R2 0.78 1.00
##  scale correlations and bootstrapped confidence intervals 
##       lower.emp lower.norm estimate upper.norm upper.emp p
## NA-NA       0.7        0.7     0.78       0.84      0.84 0

.78 is a pretty strong correlation!


I’m going to guess valence is pretty low across all the board. Puscifer maybe will have the highest valence just because some of the songs are kinda jokey, and there are a smattering of uplifting Puscifer jams (“Grand Canyon” comes to mind). But my assumption overall is that valence will be loooow for everyone.

Let’s look at this using a histogram and faceting by artist because we haven’t done that yet.

ggplot(maynard, aes(x = valence, fill = artist_name, color = artist_name)) +
  geom_histogram() +
  facet_wrap(~ artist_name, ncol = 1) +
  scale_fill_ft() +
  scale_color_ft() +
    title = "Valence of Songs"
  ) +
    legend.position = "none"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Who’s this one little guy hanging out at maximum valence in the Tool tracks?

maynard %>%
  filter(valence == max(maynard$valence)) %>%
## [1] "intermission"

Well that one doesn’t really count. If you’ve listened to Tool, you know that “Intermission” is a circus-y little interlude that lasts for < 1 minute. I’m not sure I’d classify it as a real song.


Finally, let’s take a look at tempo. According to Spotify, tempo is the estimated beats per minute of a given song. Again, because Puscifer tends to be more industrial rock and dance-y, my intuition is that on average those songs will have the highest tempo, particularly songs on the earlier albums. Likewise, my guess is that APC will have the lowest tempo, and probably especially on Thirteenth Step. Tool songs are going to be all over the place, and the within-song variability makes it hard for me to even guess what the averages will look like. 7empest will probably be pretty high up there.

Let’s make a plot similar to the first plot we made to take a look at this. The X on each plot will represent the average.

ggplot() +
  geom_point(data = maynard, aes(x = fct_relevel(album_name, album_order), y = tempo, color = artist_name), alpha = .8) +
  scale_color_ft() +
  scale_fill_ft() +
  coord_flip() +
  labs(title = "Tempo of Songs by Album",
       y = "Tempo (BPM)",
       x = "") +
  geom_point(data = maynard %>%
               group_by(artist_name, album_name) %>%
               summarize(tempo = mean(tempo, na.rm = TRUE)) %>%
             aes(x = fct_relevel(album_name, album_order), y = tempo, color = artist_name),
             shape = "x", size = 6)

And what are the 5 songs with the fastest tempos?

maynard %>%
  top_n(n = 5, wt = tempo) %>%
  select(artist_name, album_name, track_name, tempo) %>%
## # A tibble: 5 x 4
##   artist_name album_name                track_name        tempo
##   <chr>       <chr>                     <chr>             <dbl>
## 1 APC         "emotive"                 gimme gimme gimme  204.
## 2 APC         "eat the elephant"        eat the elephant   200.
## 3 Tool        "ænima"                   useful idiot       188.
## 4 Puscifer    "conditions of my parole" toma               184.
## 5 Puscifer    "\"v\" is for vagina"     dozo               182.

There’s a ton more we could do with this, but I’m going to wrap it up for now. Later on, I might return to this and work with some of the song lyrics. To wrap up, though, I’ll put together a plot that revisits song duration in a bit of a different, more art-y way.

maynard_plot <- maynard %>%
  distinct(artist_name, album_name) %>%
  group_by(artist_name) %>%
  mutate(shading = rev(row_number())) %>%
  select(artist_name, album_name, shading) %>%
  left_join(x = maynard, y = ., by = c("artist_name", "album_name")) %>%
  arrange(track_name) %>%
  mutate(id = row_number())

pad <- 1000
line_color <- "#464950"
bckgrnd <- "#252a32"

labels <- maynard_plot %>%
  top_n(n = 5, wt = duration_s) %>%
  mutate(track_name = str_to_title(track_name),
         hjust = if_else(id > median(id), 1, 0))

#setting up text to go in the middle of the circle
center_text <- tibble(
  labelz = "**<p style='color:white;font-size:11pt'>Duration of songs by Maynard James Keenan's bands: <span style='color:#ff0055'>Tool</span>, <span style='color:#909495'>A Perfect Circle</span>, and <span style='color:#0b53c1'>Puscifer.</span><br><br><span style='font-size:08pt'>*The 5 longest songs are all by <span style='color:#ff0055'>Tool.</span> The three rings correspond to 1 min, 5 mins, and 10 mins.</span>*</p>**",
  x = 0,
  y2 = -170

mjk_plot <- ggplot(maynard_plot, aes(id, duration_s, color = artist_name)) +
  geom_hline(aes(yintercept = (60)), color = line_color) +
  geom_hline(aes(yintercept = (5*60)), color = line_color) +
  geom_hline(aes(yintercept = (10*60)), color = line_color) +
  geom_segment(aes(x = id, xend = id, y = 0, yend = duration_s)) +
  geom_point() +
  geom_richtext(data = labels,
                aes(x = id, y = duration_s + 50, color = artist_name, label = track_name, hjust = hjust),
                size = 3,
                family = "Roboto Condensed",
                label.color = NA,
                fill = NA
                ) +
  geom_textbox(data = center_text,
               aes(x = x, y = y2, label = labelz),
               fill = NA, color = NA,
               width = unit(45, "mm"),
               box.hjust = .5,
               family = "Roboto Condensed",
               size = 3,
               hjust = .5) +
  coord_polar() +
  ylim(-.5*pad, max(maynard_plot$duration_s) + 850) +
  scale_color_ft() +
  theme_void() +
    panel.background = element_rect(fill = bckgrnd),
    plot.background = element_rect(fill = bckgrnd),
    legend.position = "none",
    plot.margin = margin(-15, -15, -15, -15, "lines")