Trip Advisor Data (00040)

This dataset contains 675,069 reviews of 1,851 hotels across the world scraped from Trip Advisor. The data was scraped and donated by Hongning Wang.

One file contains the numerical aspect ratings provided by the users, along with other information about the hotel. The other files contains the text of the users review (split into 3 files). These reviews have been slightly modified, all excess spaces and tabs have been removed and all commas have been changed to semi-colons.

Both files are encoded in the dat format but are actually CSV files. The first line of each file explains the fields within the file. Some of the usernames are encoded in Unicode so please be careful when parsing the files!

Selected studies: Hongning Wang, Yue Lu and Chengxiang Zhai. Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach. The 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010. | Hongning Wang, Yue Lu and ChengXiang Zhai. Latent Aspect Rating Analysis without Aspect Keyword Supervision. The 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2011. | Marco Costantini, Carla Groenland, and Ulle Endriss. Judgment Aggregation under Issue Dependencies. 30th AAAI Conference on Artificial Intelligence (AAAI), 2016.

Download the dataset [zip, 77.3 MB]

Details

Combinatorial

  • Number of files: 4
  • Total size: 221.9 MB
  • Data types: dat.
  • Publication date: Aug. 17, 2013
  • Last modification: April 10, 2024
Ratings — 00040-00000001.dat
Review Texts — 00040-00000002.dat
Review Texts — 00040-00000003.dat
Review Texts — 00040-00000004.dat