Measuring Personalization of Web Search
Measuring Personalization of Web Search
Aniko Hannak, Piotr Sapieżyński, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson
To appear in Proceedings of the 22nd International World Wide Web Conference (WWW'13), Rio de Janeiro, Brazil, May 2013.
[PDF] [BibTeX]
Data
The below link will take you to the data we collected in our experiments. The data is described in more detail in Section 5 of our paper.
We provide this data as a resource to the scientific community, if you use it, please cite us as [BibTeX].
The names of the folders represent the specific feature (e.g. gender) that was tested in that experiment. Inside the folder you can find the data for each day the test was run and each value of the given feature (e.g. male, female, other, control). The HTML files contain the first page of Google Search results returned for search terms according to the filenames.
The keywords chosen for these test from Google Zeitgeist and WebMD and they are also described in more detail in the paper.
Browse the Data
or go directly to data used to generate:
Please note, that data for real-world AMT experiment is not released due to privacy considerations.
Code
The folder "Code" contains the phantomjs scripts we used to gather our data as well as some python scripts that help parse the HTML files. Again, if you use our code, please cite us as [BibTeX].
- compare_pages.py page1.html page2.html - run this script with two html pages containing search results to calculate the Jaccard and Edit distance between the result lists
- print_parse.py page.html - run this script to extract the results and categories from a Google results page and print them to screen
- login_search_no_click_scenario2.js - run this script with PhantomJS to perform a series of searches for specified keywords without clicking on any results. See the source for instructions on usage
- login_search_click_scenario2.js - run this script with PhantomJS to perform a series of searches for specified keywords and click on specified results. See the source for instructions on usage
- login_browse.js - run this script with PhantomJS to browse a specified list of Web pages and click link within their domains. See the source for instructions on usage.
- google.py, tools.py, jquery-1.7.2.js are necessary to run the scripts listed above but should not be run on their own.
Browse the Code