Introducing an Intelligent Web Scraping (Mining) Method for Analysing Online Reviews of Tehran Accommodation Users

Document Type : Research Article

Authors

1 Department of Human Geograohy and planning, Faculty of Geography, University of Tehran, Tehran

2 Department of Tourism and Hospitality, University of Northampton, London, England

10.22059/jut.2025.378415.1221

Abstract

A B S T R A C T
This research, aimed at analyzing and evaluating online user reviews of accommodations in Tehran, utilizes advanced web scraping techniques. This paper presents a comprehensive automated web scraping method for collecting and analyzing user reviews on an online accommodation platform in Tehran. This smart method includes website selection, tool evaluation, data extraction, preprocessing, and review analysis. By employing this approach, accommodation owners, managers, and marketers can gain deep insights into customer preferences, satisfaction levels, and areas for improvement. Additionally, tourists and users can obtain valuable information from others' experiences when selecting their accommodation. Based on the research findings, the smart web scraping method enables the analysis of large and valuable datasets and can aid in strategic decision-making in the tourism services sector. The findings are presented using descriptive statistics and analytical tests, including t-tests and ANOVA, to assess the mean differences in user reviews across various hotel categories. The study indicates that the average ratings for hotel amenities, room prices, room quality, hotel location, and health protocols are generally above the expected average, reflecting a generally positive perception of Tehran's hotels
Extended Abstract
Introduction
In the age of digital information and online platforms, the tourism and hospitality industry has witnessed significant changes in how customers share their opinions and experiences. Nowadays, making informed decisions about travel, destinations, and accommodations heavily relies on online reviews by tourists. These reviews have become essential for travelers seeking the best travel and accommodation experiences. The manual analysis becomes impractical because of the importance of these opinions and experiences in tourists’ decision-making processes and the vast volume of reviews. As a result, smart web scraping methods have been employed on online platforms for tourism accommodations. Web scraping is developing a computer program to automatically download, analyze, and organize data from web pages, making it highly practical for extracting data from multiple pages simultaneously. The abundance of general tourism data available online holds significant data analysis potential. However, much of this data remains unanalyzed and underutilized. By collecting and analyzing these untapped data sources, significant improvements can be made in the tourism sector of any destination.
As mentioned, the topic of online customer reviews on tourism and hospitality websites and platforms is highly significant and has been emphasized in previous studies. While this topic has been extensively addressed in international research, it has received far less attention in Iran, especially for accommodations and hotels. Therefore, this study refers to the data collection method through web mining and reviews its importance and key process stages. Furthermore, in order to demonstrate the significance of the data collected via web scraping, descriptive statistics, differences in mean values of online user reviews, and significant differences in reviews for star-rated accommodations are calculated. Additionally, the reasoning behind choosing the website “Eghamat 24” for web scraping and analysis of online user reviews is discussed.
 
Methodology
This study aims to uncover hidden opinions in reviews and user feedback on the accommodations of accommodations in Tehran and determine if there are differences among accommodations based on review scores. Consequently, this study involves extracting user reviews from online review platforms. The extracted reviews are analyzed and converted into documents, tables, and graphs. Further analysis focuses on uncovering hidden opinions within the reviews, and scores are calculated from the feedback. This paper examines the steps and processes involved in web scraping, which include:

Website analysis;
Web scraping;
Data extraction;
Organizing, processing, and storing the data.

The study utilizes Selenium for web scraping. Selenium is a powerful and popular tool that provides a framework for automating web browsers. Essentially, it is used for scraping dynamic web pages, allowing interaction with web pages such as clicking buttons or filling out forms and extracting data from websites. The first step is installing Selenium, for which the corresponding code is written in Python. The next step involves identifying the URLs of the profiles for each accommodation, which is done using the WebDriver command from the Selenium library. After that, the URLs are accessed and clicked automatically. The next stage involves navigating the tags and elements within the URLs using the XPath command. Finally, after identifying all the tags and classes within the website's HTML source code, a for loop is used to extract all the tags and elements across all the pages. The Pandas library in Python is employed to save the data in an Excel file.
 
 
Results and discussion
The findings of the study, based on descriptive statistics and analytical tests, show that the average ratings for various aspects of hotels, such as services, room price, quality, location, and cleanliness, are generally above the expected average, indicating an overall positive perception of hotels in Tehran. The paper highlights the potential of web scraping as a powerful technique for automatically collecting data from websites, which can significantly contribute to enhancing tourism services and making strategic decisions in the industry. By studying and analyzing these reviews, businesses can better understand tourists' needs and preferences and implement necessary improvements in accommodations. Some of the impacts of using automated web scraping for businesses and users include:

Improving service quality;
Better decision-making by users;
Marketing strategy enhancements;
Boosting business trust.

 
Conclusion
The impact of online reviews and customer feedback from tourists is becoming increasingly significant. Tourists trust online reviews when planning their trips, while businesses use them to create effective marketing strategies. However, analyzing individual reviews manually is impractical due to the sheer volume of available data. As a result, considerable efforts have been made in recent years to develop methods for automatic analysis and summarization of reviews. This research demonstrates how various analyses conducted on user feedback data from online platforms concerning accommodations in Tehran can help businesses and users, ultimately leading to improved business practices.
 
Funding
There is no funding support.
 
 Authors’ Contribution
Authors contributed equally to the conceptualization and writing of the article. All of the authors approved the content of the manuscript and agreed on all aspects of the work declaration of competing interest none.
 
 Conflict of Interest
Authors declared no conflict of interest.
 
Acknowledgments
We are grateful to all the scientific consultants of this paper.

Keywords


  1. Adhinugroho, Y., Putra, A. P., Luqman, M., Ermawan, G. Y., Mariyah, S., & Pramana, S. (2020). Development of online travel Web scraping for tourism statistics in Indonesia. DOI: 10.47989/irpaper885
  2. Ahmad, W., & Sun, J. (2018). Modeling consumer distrust of online hotel reviews. International Journal of Hospitality Management71, 77-90. DOI: 10.1016/j.ijhm.2017.12.005
  3. Alaei, A. R., Becken, S., & Stantic, B. (2019). Sentiment analysis in tourism: capitalizing on big data. Journal of travel research58(2), 175-191. DOI: 10.1177/004728751774775
  4. Ali, T., Omar, B., & Soulaimane, K. (2022). Analyzing tourism reviews using an LDA topic-based sentiment analysis approach. MethodsX9, 101894. DOI: 10.1016/j.mex.2022.101894
  5. Aluri, A., Price, B. S., & McIntyre, N. H. (2019). Using machine learning to cocreate value through dynamic customer engagement in a brand loyalty program. Journal of Hospitality & Tourism Research43(1), 78-100. DOI: 10.1177/1096348017753521
  6. Barbera, G., Araujo, L., & Fernandes, S. (2023). The Value of Web Data Scraping: An Application to TripAdvisor. Big Data and Cognitive Computing7(3), 121. DOI: 10.3390/bdcc7030121
  7. Beck, J., Danilenko, M., Sperber, L., Wiersma, B., & Egger, R. (2017). Connecting big data and service quality evaluation–developing a service quality map of the Austrian hospitality industry through the application of big data. The Gaze: Journal of Tourism and Hospitality8, 40-54. DOI: 10.3126/gaze.v8i0.17831
  8. Berezina, K., Bilgihan, A., Cobanoglu, C., & Okumus, F. (2016). Understanding satisfied and dissatisfied hotel customers: text mining of online hotel reviews. Journal of Hospitality Marketing & Management25(1), 1-24. DOI: 10.1080/19368623.2015.983631
  9. Choong, W. J. (2019). An automated web scraping tool for Malaysia tourism (Doctoral dissertation, UTAR).
  10. Dogucu, M., & Çetinkaya-Rundel, M. (2021). Web scraping in the statistics and data science curriculum: Challenges and opportunities. Journal of Statistics and Data Science Education29(sup1), S112-S122. DOI: 10.1080/10691898.2020.1787116
  11. DOI: 10.1080/13683500.2015.1127336
  12. Fazzolari, M., & Petrocchi, M. (2018). A study on online travel reviews through intelligent data analysis. Information Technology & Tourism20(1), 37-58. DOI: 10.1007/s40558-018-0121-z
  13. Gheorghe, M., Mihai, F. C., & Dârdală, M. (2018). Modern techniques of web scraping for data scientists. International Journal of User-System Interaction11(1), 63-75.
  14. Haghverdizadeh, A., Zarei, G., Asgarnezhad Nouri, B., & Rahimi Kolour, H. (2023). Urban Smart Tourism Development Model Based on Marketing Ecosystem the Case Study of Tabriz City. Journal of urban tourism10(3), 127-146. DOI: 10.22059/jut.2023.365511.1157 [In Persian]
  15. Han, S., & Anderson, C. K. (2021). Web scraping for hospitality research: Overview, opportunities, and implications. Cornell Hospitality Quarterly62(1), 89-104. DOI: 10.1177/193896552097358
  16. Kamarazaman, N., Ali, N., & Arshad, H. (2024). Leveraging Web Scraping To Gather Tourism Information Data. Journal of Event, Tourism and Hospitality Studies4, 16-29. DOI: 10.32890/jeth2024.4.2
  17. Khder, M. A. (2021). Web scraping or web crawling: State of art, techniques, approaches and application. International Journal of Advances in Soft Computing & Its Applications13(3). DOI: 10.32890/jeth2024.4.2
  18. Liu, Y., Teichert, T., Rossi, M., Li, H., & Hu, F. (2017). Big data for big insights: Investigating language-specific drivers of hotel satisfaction with 412,784 user-generated reviews. Tourism Management59, 554-563. DOI: 10.1016/j.tourman.2016.08.012
  19. Luo, N., Kwan, C., Sun, Y., & Zhang, F. (2020, October). Analyzing and Filtering Food Items In Restaurant Reviews: Sentiment Analysis and Web Scraping. In Computer Science & Information Technology (CS & IT) Computer Science Conference DOI: 10.5121/csit.2020.101208
  20. Meymandi, F., Keyvannia, S., & Shirmohammadi, A. (2024). Examining the Challenges of Smartening Eco-lodges. Journal of urban tourism10(4), 17-35. DOI: 10.22059/jut.2024.352721.1102 [In Persian]
  21. Moro, S., Batista, F., Rita, P., Oliveira, C., & Ribeiro, R. (2019a). Are the states united? An analysis of US hotels’ offers through TripAdvisor’s eyes. Journal of Hospitality & Tourism Research43(7), 1112-1129. DOI: 10.1177/109634801985479
  22. Moro, S., Ramos, P., Esmerado, J., & Jalali, S. M. J. (2019b). Can we trace back hotel online reviews’ characteristics using gamification features? International Journal of Information Management44, 88-95. DOI: 10.1016/j.ijinfomgt.2018.09.015
  23. Moro, S., Rita, P., & Oliveira, C. (2018). Factors influencing hotels’ online prices. Journal of Hospitality Marketing & Management27(4), 443-464. DOI: 10.1080/19368623.2018.1395379
  24. Nwakanma, C. I., Ogbonna, A. C., Etus, C., Nwifor, E. U., Onyebuchi, J. E., & Ugwueke, E. C. Predictive analytics of customer sentiments towards Nigerian hospitality industry: Case study approach. In Proc. 3rd International Conference on Intelligent Computing and Emerging Technologies (ICET 2019) (pp. 60-68).
  25. Oses Fernández, N., Kepa Gerrikagoitia, J., & Alzua-Sorzabal, A. (2018). Sampling method for monitoring the alternative accommodation market. Current Issues in Tourism21(7), 721-734.
  26. Pang, C. C. (2023). Hotel recommendation system with machine learning (Doctoral dissertation, UTAR).
  27. Phillips, P., Barnes, S., Zigan, K., & Schegg, R. (2017). Understanding the impact of online reviews on hotel performance: an empirical analysis. Journal of travel research56(2), 235-249. DOI: 10.1177/0047287516636
  28. Pokhrel, S., Somasiri, N., Jeyavadhanam, R., & Ganesan, S. (2023). Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis. International Journal of Electrical and Computer Engineering17(11), 300-307.
  29. Radojevic, T., Stanisic, N., & Stanic, N. (2015). Ensuring positive feedback: Factors that influence customer satisfaction in the contemporary hospitality industry. Tourism management51, 13-21. DOI: 10.1016/j.tourman.2015.04.002
  30. Rajan, A., & Shyam, A. V. (2015). Sentiment Analysis on Customer Reviews in Tourism-A Text Mining Approach.
  31. Renganathan, V., & Upadhya, A. (2021). Dubai restaurants: A sentiment analysis of tourist reviews. Academica Turistica-Tourism and Innovation Journal14(2). DOI: 10.26493/2335-4194.14.165-174
  32. Sahu, S., Divya, K., Rastogi, N., Yadav, P. K., & Perwej, Y. (2022). Sentimental Analysis on Web Scraping Using Machine Learning Method. Journal of Information and Computational Science (JOICS), ISSN, 1548-7741. DOI: 10.12733/JICS.2022/V12I08.535569.67004
  33. Sequeira, S., Joy, J., Dsouza, D., & Kaul, P. (2020, June). Dynamic review modelling and recommendation of tourism data. In Proceedings of the International Conference on Recent Advances in Computational Techniques (IC-RACT).
  34. Sharma, K., & Borkar, G. M. Comparative Analysis of Dynamic Web Scraping Strategies: Evaluating Techniques for Enhanced Data Acquisition, In: Ashish Kumar Tripathi and Vivek Shrivastava (eds), Advancements in Communication and Systems, SCRS, India, 2024, pp. 241-252. DOI: 10.56155/978-81-955020-7-3-22
  35. Stringam, B., Gerdes, J. H., & Anderson, C. K. (2023). Legal and ethical issues of collecting and using online hospitality data. Cornell Hospitality Quarterly64(1), 54-62. DOI: 10.1177/19389655211040434
  36. Wong, E., Rasoolimanesh, S. M., & Pahlevan Sharif, S. (2020). Using online travel agent platforms to determine factors influencing hotel guest satisfaction. Journal of Hospitality and Tourism Technology11(3), 425-445. DOI: 10.1108/JHTT-07-2019-0099
  37. Yildirim, Y., Ulucan, A., & Atici, K. B. (2023). Classifying Scuba Diving Sites through Diver Reviews with a Web Scraping Based UTADIS Application. Croatian Operational Research Review14(2), 137-148. DOI: 10.17535/crorr.2023.0012
  38. Yuan, S. (2023). Design and Visualization of Python Web Scraping Based on Third-Party Libraries and Selenium Tools. Academic Journal of Computing & Information Science6(9), 25-31. DOI: 10.25236/AJCIS.2023.060904