Pandora.com creates personal online radio stations for users who input several artists and songs they enjoy. It works by coding all of the hundreds of thousands of songs in its library into a set of features such as "aggressive guitar solo" and "vocal-based," and, presumably, presenting users with songs that have features comporable to the ones for which they have indicated a preference.
This is not the only method for song selection that Pandora could implement. Another possibility is to forego any feature-based system, and strictly use networks of user preference. For instance, if I tell Pandora that I like Opeth, and Pandora's data shows that users who like Opeth have a probability of .8 of also liking Children of Bodom, Pandora would be likely to insert a Children of Bodom song in my station.
The preference-network method is more appealing than the feature-based method for two reasons. First, it is simpler. The preference method eschews the induced artificiality of creating features and of categorizing songs by these features. Second, the preference method doesn't digitize aesthetic preference, but rather captures a richer sense of the patterns of music taste. Having my music tastes converted into a feature matrix was initially a big turn-off for me as a Pandora user.
Josh and Diana support the feature-based method, though. Networks of music preference are surely scale-free, they argue, and thus are subject to hubs and high clustering coeffecients. Hubs in the Pandora network are songs that large amounts of people enjoy, for instance "Come Together" by the Beatles. If Pandora chooses songs for us based on preference, it will be biased towards hubs, and thus present us with more popular music at the expense of introducing us to the obscure. High clustering coeffecient refers to the prevalence of "neighborhoods" of strong connectivity within the network. One such neighborhood might approximate the death metal genre. If I tell Pandora that I like an Opeth song, and a Children of Bodom song, I may find my station stuck in a sort of musical ghetto, where Pandora presents me only with songs within the death metal neighborhood.
Furthermore, Josh is concerned with the issue of new additions to the Pandora library. By Josh's model, in a developing scale-free network, older nodes will receive greater connectivity, while new additions will have less connectivity compared to a system based on feature alone.
I want to put forth some responses to these arguments. To solve the problem introduced by hubs and generally counteract the popularity bias described, we could divide the raw "preference probability" of each song by its overall popularity. Say that 80% of all people who like Opeth also like Children of Bodom (because the bands are similar), and 80% of people who like Opeth also like the Beatles (because the Beatles are popular across fans of a huge swath of genres). If we divide each probability by the total popularity of the band in question, the Beatles quantity would be greatly diminished relative to the Children of Bodom quantity, and the preference-based Pandora would deliver us Opeth fans the truly similar Children of Bodom and not the merely popular Beatles.
I agree with the assertion that the preference-based network may be subject to a high clustering coeffecient. I think, though, that the (presumably uniform rather than scale-free) network formed by feature-based Pandora is also subject to a high clustering coeffecient. For example, I have a station based on the music of Ratatat, Aphex Twin, Daft Punk, and Amon Tobin which has delivered to me naught but a constant stream of electronica.
The clustering coeffecient problem in both systems could be solved by adding a "random jump" component to the song-selecting algorithm of both. This would be similar to the method Google employs to crawl through much of the Internet by jumping from link to link on websites. The Google "random surfer" might get stuck in a neighborhood or a loop, but to avoid this, the surfer will "jump" to another random site every once in a while. Pandora's algorithm could (and indeed may) similarly jump to a random song every so often to avoid reinforcing a user's neighborhood of preference.
I think the biggest challange to a preference-based Pandora is the problem of new music in the Pandora catalogue. It does seem that new music, and any music with less total exposure, will be subject to wildly varying preference ratings, and those songs which by the laws of chance initially receive low preference ratings might be buried in the system and never experience wide circulation on Pandora. To partially combat this, we might require that all songs receive a minimum quantity of ratings by random users before their preference matrices are calculated. This manipulation, though, certainly ruins some of the simplicity and purity I originally attributed to the preference-based system.
While it seems to me that a preference-based Pandora would maximize total user enjoyment ratings for a static library of songs, the feature-based Pandora works better for a growing library and does a better job exposing users to obscure songs. This is in keeping with its motto: Pandora purports not to play music I like, but "to help me discover more music I like."