Group Think

Entertainment & Media / by Elizabeth Cline /

A Tel Aviv University professor melds math and sociology of the Internet to predict the next big thing in music.

Click to enlarge

Professor Yuval Shavitt, of Tel Aviv University’s School of Electric Engineering, is melding math and sociology to describe mass behavior on the Internet. He is the principal investigator of DIMES, a project that hopes to map the structure and topology of the Internet, begun four years ago. And for the past year, he has used data-mining tools to collect and interpret massive amounts of data from file-sharing networks. By applying a decades-old sociological theory that describes the spread of information in social networks to the online world, he has been able to develop a predictive algorithm that identifies musicians who will ascend from local popularity to national stardom.

Shavitt and a team of graduate students developed their algorithm first by collecting half a billion search-query strings from Gnutella, a peer-to-peer file-sharing network. Non-music-related searches and searches for already-popular musicians are eliminated, and the remaining queries are tagged and sorted by the specific city or region from which the queries originated, using IP addresses. These searches are dubbed “geo-aware query strings.”

The geographic location of an emerging artist is the key to predicting their success, explains Shavitt. “If an artist has the potential to be successful, people will first start noticing them in the small geographical area where they live and perform.” In fact, a potential pop star will typically enjoy thousands of downloads a day on a local level, while remaining relatively unheard of on a national level. A large divergence between local and global popularity, called the Kullback-Leiber divergence, is a strong indicator of star potential. The algorithm measures the K-L divergence to produce a short list of potentials, of which 15 to 30 percent will go on to reach national popularity within weeks.

According to Shavitt’s paper on the subject, “Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query String,” presented at the International Conference on Knowledge Discovery and Data Mining last August in Las Vegas, his predictive algorithm is based on the groundbreaking sociological theories of Mark Granovetter, who first described in the 1970s how micro-level interactions between individuals affect macro-level phenomena. From Granovetter’s work emerged the small-world model, which is able to predict a product’s success based on its adoption by a small network of people — assuming that the “main driver behind a product’s growth is communication between individuals.”

The use of geo-aware peer-to-peer query strings presents a potentially major shift in music hit-prediction software, most of which — like Hit Song Science — collects data on the sound of a song, then compares the melody, tempo, and lyrics for example of a potential hit to a database of established hits. “Our algorithm never hears the actual song; it is based on the Internet mirroring of the social word of mouth of people spreading their interest in the song,” says Shavitt. “It will be interesting to compare the success rates of both approaches.”

But Shavitt’s algorithm may have wider implications. He and his team of researches have been contemplating using the algorithm to predict the success potential of a homegrown politician, for example. Text encryption would be needed to data-mine searches on politicians, as their Internet presence is best measured in their popularity as discussion topics in forums and chat rooms. It’s much trickier to data-mine text, as compared with numbers — and to determine if what’s being written about public figures online is positive — “but it’s certainly doable,” says Shavitt. With the growing sophistication and popularity of online social networking sites and file-sharing services, Shavitt demonstrates how math can describe and harness mass behavior in online environments. The applications of which could be endless.

Originally published December 22, 2008

Tags data decision making networks research systems theory

Share this Stumbleupon Reddit Email + More


  • Ideas

    I Tried Almost Everything Else

    John Rinn, snowboarder, skateboarder, and “genomic origamist,” on why we should dumpster-dive in our genomes and the inspiration of a middle-distance runner.

  • Ideas

    Going, Going, Gone

    The second most common element in the universe is increasingly rare on Earth—except, for now, in America.

  • Ideas

    Earth-like Planets Aren’t Rare

    Renowned planetary scientist James Kasting on the odds of finding another Earth-like planet and the power of science fiction.

The Seed Salon

Video: conversations with leading scientists and thinkers on fundamental issues and ideas at the edge of science and culture.

Are We Beyond the Two Cultures?

Video: Seed revisits the questions C.P. Snow raised about science and the humanities 50 years by asking six great thinkers, Where are we now?

Saved by Science

Audio slideshow: Justine Cooper's large-format photographs of the collections behind the walls of the American Museum of Natural History.

The Universe in 2009

In 2009, we are celebrating curiosity and creativity with a dynamic look at the very best ideas that give us reason for optimism.

Revolutionary Minds
The Interpreters

In this installment of Revolutionary Minds, five people who use the new tools of science to educate, illuminate, and engage.

The Seed Design Series

Leading scientists, designers, and architects on ideas like the personal genome, brain visualization, generative architecture, and collective design.

The Seed State of Science

Seed examines the radical changes within science itself by assessing the evolving role of scientists and the shifting dimensions of scientific practice.

A Place for Science

On the trail of the haunts, homes, and posts of knowledge, from the laboratory to the field.


Witness the science. Stunning photographic portfolios from the pages of Seed magazine.

SEEDMAGAZINE.COM by Seed Media Group. ©2005-2015 Seed Media Group LLC. All Rights Reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | Research Blogging | SEEDMAGAZINE.COM