What's Good: An Analysis of Pitchfork and Anthony Fantano Album Reviews

CMSC320 Final Tutorial

By Amy Hein

Part 1: Introduction

Reviews are important to us. Before we choose where to eat, or what to buy online, the first thing many people do is check the reviews. Indubitably, these reviews affect how we act, be it on Amazon, Google Maps, or otherwise. Not only do reviews affect how people consume food or other goods, it can also affect how some people consume music. There are two major music reviewing entities in online music discussion circles. They are Pitchfork, a tenured online publication with over 18 thousand reviews, and a bald-headed YouTube creator by the name of Anthony Fantano. Fantano's influence may seem surprising given that he is just one person talking to a camera from the suburbs of Connecticut, but he has garnered a large online cult following of people who take what he says about music to heart. More so than telling you what to listen to or not listen to, the reviews from these two sources mainly serve as talking points in online music discussion circles. Indeed, reviews about art should be treated differently than reviews about material goods. People can agree agree and disagree with these reviews, and talk about their differing opinions. Additionally, people in the online music community often look to entities like Pitchfork and Fantano when they want recommendations on music to listen to.

There are many questions we can ask about this phenomenon from a data science perspective. Does either source have a bias towards certain time periods? Is one entity "nicer" than the other? How often do they agree? Disagree? And at the very core of this- are there objective quantities that we can look at to determine how much people will like a piece of music, or will art always be subjective? On this premise, I will take you through the data science life cycle. This will include data collection and processing, exploratory data analysis, hypothesis testing, and machine learning.

One thing I will note is that there are several blocks of code that are commented out. The output that these blocks produce is always exported into a csv file, which I include in the repository. Some of these blocks take a very long time to run, and it's much more convenient if we don't have to do that every time.

Part 2: Data Collection and Processing

Our goal is to get Spotify data for all the albums that have been reviewed by both Anthony Fantano and Pitchfork. First we have to find the albums in common, then get the Spotify data for each track, then average the data per album. Ultimately there are three datasets that we have to work with to explore this topic: Pitchfork's review data, Anthony Fantano's review data, and Spotify's music data.

Pitchfork Data

The dataset that we will be using for Pitchfork reviews available on kaggle here. The attributes of this dataset are as follows:

Let us download database.sqlite from that page and place it handily in our project directory. Instead of in SQL format, we want our data in a Pandas dataframe for easier processing. We can do that by opening a connection to the database and using the pandas' read_sql_query() function.

Some albums have multiple artists, such as the 3rd review on the above list, "First Songs" by Kleenex, Liliput. I happen to know that those are the same band under two different names, but we're not sure which name the album will be listed under on Spotify. So, If artist takes the form artist 1, artist 2, we want to list the same album two times under both artists. If we don't find a match for either of them later, we can drop it.

Fantano Data

Now for the second dataset, we can start to look at Anthony Fantano's review data. The attributes of the dataset are as follows:

That can be retrieved from kaggle here and, once again, handily placed in our project directory.

There are a few more columns here than we need, so we will go ahead and drop those, then merge with the pitchfork data.

Merge Pitchfork data with Fantano data

Okay, here is our merged data, but it's not perfect. We can see that there are multiple albums of the same name. Yoy may have noticed that we did not do something to handle the dual-artist albums for the Anthony Fantano data. This is because they were separated with the word "and," and it is plausible that an artist name will contain "and" in it. So, separating based on 'and' would have ruled out a lot of good data. This is also why we didn't merge on artist. Instead we will check if the Pitchfork artist pf_artist is contained in the string for Fantano artist af_artist, and delete non-matching rows based on that.

Spotify Data

Alright! Now we must fetch the appropriate data from Spotify using their API. This is going to be a little bit more involved. We first have to make a Spotify Developer account. That can be done here. Once you have done that, you'll come to a dashboard that will prompt you to "Create an App." Click on that, and enter some details. Screen%20Shot%202021-12-17%20at%2010.01.28%20AM.png Once you create an app from there, Spotify will take you to an overview of your app. This is where you can find your client id, client secret, as well as a few visual aids that have to do with the data your app is handling.

Screen%20Shot%202021-12-17%20at%2010.05.16%20AM.png

Now that we have that set up, we can start to work on connecting to Spotify's API and retrieving our relevant data. An important library that is going to help us achieve this is spotipy, a python library for the Spotify API. Spotify's API takes IDs instead of names as keys for data, but our data does not contain Spotify IDs. We can use spotipy to get these IDs to give to Spotify.

Why might I do this for artists and not for albums? Two albums sharing a name is more probable than two artists sharing a name. Therefore, we might get the best matching data by first fetching data per artist reviewed then filtering by albums reviewed. Let's go ahead and get the IDs of all the artists in the Pitchfork database. Because this code takes a second to run, I have saved the output into a CSV file, and commented out the lines that generate the csv. It can easily be uncommented and ran if needed.

Now that we have all the artist IDs, we must get all the album IDs. We will write this to a csv as well, for more time saving.

Now that we have all the album IDs, lets get some album metadata and match it with our reviewed album titles and artists. Then, we will know which album IDs are relevant to us.

Album ID data merged with review dataset

These are all the albums we could successfully find a match for on Spotify. Okay, not perfect though. It seems like Spotify sometimes has multiple copies of the same album. We'll assume that any album with a matching title and artist is sufficient, and just use the ID of the first match

Now that we have all those album IDs attached to commonly reviewed albums, we can get the tracks from those albums and average some of their sound features.

Almost there! What we have right now is every Spotify track ID from every album that has received a review from both Pitchfork and Anthony Fantano (that we could find a match for). Let's get the track metadata and audio features. To do this, I'm going to use a code snippet from Angelica Dietzel's article on BetterProgramming.pub about extracting artist data using Spotify's API.

That dataset took a long time to make, so I do not recommend trying that code out for yourself. Anyway, at last, we can consolidate by album, and merge with our other dataset.

Folks... we have our dataset! At last, let's do some exploratory data analysis.

Part 3: Exploratory Data Analysis

Matplotlib will come in handy for graphs and visualizations.

How do the two entities compare?

Let's first take a look at how Fantano's reviews compare to Pitchfork reviews. Is he usually more forgiving or less forgiving? We can find out by plotting Fantano's album scores against Pitchfork's album scores. Each dot represents an album that they both rated.

We're going to separate the data into two groups, albums who earned a Best New Music tag from Pitchfork and those that didn't.

The Best New Music tag didn't give us much new information other than that the score is above an 8.3~ or so. One thing that's important to note here given the shape of this graph, is that Fantano usually only gives integer scores to albums, while Pitchfork gives scores to one decimal point. Let's add a regression line to the plot.

According to this line, on average, a score that Fantano gives will be about a third of the Pitchfork score, plus 4. This means that a 1 from Pitchfork would be about a 4 from Fantano, a Pitchfork 5 is a Fantano 5 to 6, and a Pitchfork 10 is about a Fantano 7. We can gather that Fantano is a little bit more hesitant to give scores on the extreme ends of the 0 to 10 scale. Lets see if we can find more patterns that support this. Let's start by making histograms of each of their scores.

In general, red will be used to represent Pitchfork data, while black will be used to represent Fantano data.

The red in this graph represents Pitchfork scores, and the black represents Fantano scores. This tells us a slightly different story, one that tells us that pitchfork is more apt to give scores around the 7 and 8 range, while Fantano's ratings are a little bit more normally distributed.

We can also extend this sort of graph to all the albums that both entities have reviewed, not just the ones that have been reviewed by both parties.

The shapes of the graphs are similar, but not exactly the same. This means that missing data is somewhat of a factor in the averages and whatnot of the scores. I believe that the data is Missing at Random. Because Anthony Fantano is only one person and Pitchfork is a long-running publication with a whole team, Fantano may only have reviewed albums that were especially relevant during the time he has been an active content creator. This is much shorter than the amount of time that Pitchfork has been around. There are also albums that Fantano has reviewed that Pitchfork hasn't, and it's hard to say what those reasons might be. It's possible that he really likes certain albums that a large publication doesn't have on its radar, or any number of other things.

Back to data analysis, the way that Pitchfork and Fantano rate their albums lends itself well to a violin graph, given Fantano's integer-only album scores. We'll use seaborn to help us with the violin plot.

Although there appears to be correlation between how Fantano and Pitchfork score things (we'll explore this more later), clearly, the two sources don't always agree.

Patterns with Spotify data

As mentioned previously, Spotify gives us a number of track features that have been averaged per album. We can look at those and determine whether Fantano or Pitchfork prefers music with these qualities.

I will be the first to admit that none of these plots are all that interesting. One thing we can notice, though is that Anthony's black line is lower than Pitchfork's red line for every quality that spotify gives us. However, how far apart the lines are varies from quality to quality, We already knew that Anthony was generally less nice to these albums than Pitchfork, so to get the most out of this data, it might be best to normalize the review scores. I believe that we should normalize around only the shared albums and not the respective review datasets. We should also use mean normalization, because the ranges of the data are the same.

This looks like data we can compare more accurately. Lets try making the same sorts of plots with the normalized data instead.

Just from eyeballing, these graphs point towards there not being a strong correlation between any one spotify audio feature and a high or low rating. Before we move on to hypothesis testing, there are a couple of other things I would like to plot out using our time data. Did either source get nicer over time? And do they prefer certain eras of music?

None of these graphs seem to point towards any trend with relation to temporal things. We can conclude that neither source has changed their rating criteria over time, nor do they have biases for records from older or more recent years.

Part 4: Hypothesis Testing

For this part, we are first going to check if our datasets are normally distributed. The result will help us know what other tests are appropriate to run. We are first going to perform a Shapiro-Wilk test on both review datasets. The null hypothesis in this case is that the data is normally distributed. Source: https://machinelearningmastery.com/a-gentle-introduction-to-normality-tests-in-python/

We can tell that the data is not normally distributed from the low p-values, and reject the null hypothesis. Let's try it with the normalized data from earlier.

Even the normalized data is not actually normally distributed. It is more so just the same shape, just shifted a little bit so the means are in the same vicinity. We even get the exact same p and stat values. Because the data is not distributed normally, we have to run nonparametric tests. We can run a Spearman correlation analysis to see if the Pitchfork and Fantano album scores are correlated. The null hypothesis in this case is that the Fantano and Pitchfork album scores are not correlated.

The Spearman correlation test gave us a very low p-value, indicating that we can reject the null-hypothesis that the two sets are not correlated. This indicates that there are qualities about an album or piece of music that can help to earn good scores from both entities. It's not clear whether or not these scores can be predicted or guessed given the audio features that Spotify provides to us. That is where the machine learning comes in.

Part 5: Machine Learning

We are going to use this to train a multivariate linear regression model.

Models to predict Pitchfork rating

Some coefficients are much larger than others, which tells us how important they are. Popularity seems to have the least effect on score (coefficient might as well be 0) and speechiness seems to have the most effect on how Pitchfork scores albums. Lets look at the R-squared value to see how accurate this model is.

The R-squared is very low. The value means that only a very small amount of Pitchfork scores can be completely explained by the indpendent audio variables. Let's see if and how much the model improves if we consider Anthony Fantano's score as well.

Let's see what the average difference between the actual score and the predicted score is.

Not so bad, but still not great. It almost feels like cheating, but let's see what happens when we account for the best_new_music column too.

Definitely a lot better, but how could it not be? Lets see if Fantano's scores are any more predictable.

Models to predict Fantano rating

And with pitchfork score?

Only slightly better, but still pretty far off. let's see what the average difference in rating between the model prediction and actual score is.

That number is not a great measure of how good the model is, we do over and undershoot a lot. Last but not least, let's try using the k nearest neighbors algorithm to predict whether something is tagged Best New Music or not by Pitchfork, based on the sound attributes and Fantano score.

It is accurate about 73% of the time. That's better than just guessing!

Part 6: Conclusion

This has been a journey through the Data Science pipeline, through the lens of music data and criticism. We learned many valuable lessons along the way. We learned how to wrangle lots of data despite the odds. We learned that API calls can take a long time. We learned that critics don't always agree, but they do sometimes. Most importantly, we've learned that art is truly subjective, and we mustn't try to quantify personal taste. To each their own, indeed.