Measuring Personalization of Web Search

Measuring Personalization of Web Search
Aniko Hannak, Piotr Sapieżyński, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson
To appear in Proceedings of the 22nd International World Wide Web Conference (WWW'13), Rio de Janeiro, Brazil, May 2013.
[PDF] [BibTeX]

Data

The below link will take you to the data we collected in our experiments. The data is described in more detail in Section 5 of our paper. We provide this data as a resource to the scientific community, if you use it, please cite us as [BibTeX].
The names of the folders represent the specific feature (e.g. gender) that was tested in that experiment. Inside the folder you can find the data for each day the test was run and each value of the given feature (e.g. male, female, other, control). The HTML files contain the first page of Google Search results returned for search terms according to the filenames.

The keywords chosen for these test from Google Zeitgeist and WebMD and they are also described in more detail in the paper.

Browse the Data or go directly to data used to generate:

Please note, that data for real-world AMT experiment is not released due to privacy considerations.

Code

The folder "Code" contains the phantomjs scripts we used to gather our data as well as some python scripts that help parse the HTML files. Again, if you use our code, please cite us as [BibTeX].

Browse the Code