Measuring Personalization of Web Search

Want to look at a brief overview of the project and a summary of results? Check out the press page View Press Page

Interested in reading the paper? Well, the paper can be found here.

You are welcome to use any of the code or data we collected, but if you do we just ask that you cite us with [BibTex]

Our code can be found here. The instructions on how to use each script is below.

  • Run python compare_pages.py page1.html page2.html to calculate the Jaccard and Edit distance between the result lists.
  • Run print_parse.py page.html to extract the results and categories from a Google results page and print them to screen.
  • Run phantomjs login_search_no_click_scenario2.js to perform a series of searches for specified keywords without clicking on any results.
  • Run phantomjs login_search_click_scenario2.js to perform a series of searches for specified keywords and click on specified results.
  • Run phantomjs login_browse.js to browse a specified list of Web pages and click link within their domains.
  • google.py, tools.py, and jquery-1.7.2.js are necessary to run the scripts listed above but should not be run on their own.

Our data can be found here.

The names of the folders represent the specific feature (e.g. gender) that was tested in that experiment. Inside the folder you can find the data for each day the test was run and each value of the given feature (e.g. male, female, other, control). The HTML files contain the first page of Google Search results returned for search terms according to the filenames.

The keywords chosen for these tests are from Google Zeitgeist and WebMD. They are described in more detail in the paper.

Please note, that data for real-world AMT experiment is not released due to privacy considerations.