August 30, 2007

The Theoretical Foundation of Musedot

The consumption of music is a highly personal and subjective activity that has not and cannot be adequately served by algorithmic approaches based on subjective musical taxonomic systems (Pandora), listening habit aggregation and analysis (last.fm), rudimentary social-network sharing (MySpace), or purchase histories (Amazon.com and others). In my opinion, all these approaches are fundamentally flawed and offer little true utility. They poorly mimic how people seek and discover (as opposed to being marketed to by traditional means) music in the “real” world, and also do little to augment these natural behaviors. The fundamental flaws with these approaches are that they are highly impersonal and reinforce pre-internet mass marketing techniques. They are undermined primarily by flaws I call popularity amplification and similarity paradox.

Recommendation technologies such as those offered by amazon.com suffer acutely from the flaw of popularity amplification. It is certainly possible for amazon.com to make recommendations by matching a user’s purchase history with that of other customers. This works to a reasonable degree because the preferences of music listeners do correlate in taste cliques (Liu & Maes, 2005). However one can easily foresee problems with this algorithmic approach when presented with scarce purchase metrics upon which to base the recommendations. This will obviously happen without fail in the case of debut or obscure artists. Thus the system ends up weighted towards already “popular” artists with substantial catalogs and sales, regardless of the true value of a specific artist to a specific user.

Last.fm is a leading, but by no means exclusive, example of the similarity paradox The technology underlying this service is based on the theory that users are more likely to be interested in music that is “similar” to another artist that the user specifies either directly or through the tracking of listening habits. Other than in broad genre-level taxonomies (indie pop, would be an example of such a classification), what constitutes “similar” in music cannot be reduced to objective criteria, yet much effort has been made in creating systems for this purpose, or in the case of last.fm creating a listening habit aggregation system that ferrets out these similarities.

The similarity based system posits that if a listener likes band A, and band B is “similar” to band A, the listener is more likely to like band B as well. However, once one moves beyond broad genre-classifications, any criteria used to establish similarity between artists is inherently subjective, meaning algorithmically irreducible. The application of this approach produces a collection of artists functionally equivalent, and/or already quite well known to the user, or, most critically, perceived as subjectively dissimilar by the user. In other words, much more often than not, a list of “similar” artists is of absolutely no value to the user.

The bottom line is that neither “popularity”, as defined and shaped by the music industry, nor “similarity” has any practical predictive value for any individual listener. They are impersonal criteria, and their derivative technologies merely enhance pre-internet market aggregation techniques and theories. It is my contention that it is absolutely impossible for any such algorithmic approach to predict, with any useful percentage of utility, what music a listener will actually like- at least not with any technology available today or in the foreseeable future. The lack of rapid organic growth among the current crop of music utilities I’ve mentioned is itself sufficient evidence of this truth.

The Solution: Social Data Mining

One of the most important, and least predicted, developments in Internet technology over the last few years has been the rise of social networks such as MySpace and Facebook. These online “communities” have empowered users to map and augment the social relationships of the real world, and have brought innovations in basic human activities like communicating and social organizing. It is through these sites that relationships between people are for the first time being modeled in useful and easily accessible ways. These, and other related technologies, are often referred to as Web 2.0, or Internet technologies focused on improving the social mechanisms of communities such as collaboration and sharing.

The advent of these social networks is absolutely critical to the challenge of creating music discovery technology because their value lies in the relationships that are represented, and these relationships map to and shape preferences in all consumable media. For the first time in history, we now have a wealth of data sourced from the human relationships that enable the production and support the consumption of music in particular. And this data is growing in quality and quantity at a staggering rate. Therefore, I propose that the creative application of data analysis techniques on these new data sets, or social data mining (Liu & Maes, 2005), will result in innovative internet applications that will provide users with truly valuable personalized tools, driven by the same underlying social networks that they create and rely on in the real world, and which will succeed in systematically exposing them to music they care about.

References

Hugo Liu and Pattie Maes: 2005, InterestMap: Harvesting Social Network Profiles for Recommendations. Workshop: Beyond Personalization (San Diego 2005). http://ambient.media.mit.edu/assets/_pubs/BP2005-hugo-interestmap.pdf