Document Type : Research Article
Authors
1
Department of Human Geograohy and planning, Faculty of Geography, University of Tehran, Tehran
2
Department of Tourism and Hospitality, University of Northampton, London, England
10.22059/jut.2025.378415.1221
Abstract
A B S T R A C T
This research, aimed at analyzing and evaluating online user reviews of accommodations in Tehran, utilizes advanced web scraping techniques. This paper presents a comprehensive automated web scraping method for collecting and analyzing user reviews on an online accommodation platform in Tehran. This smart method includes website selection, tool evaluation, data extraction, preprocessing, and review analysis. By employing this approach, accommodation owners, managers, and marketers can gain deep insights into customer preferences, satisfaction levels, and areas for improvement. Additionally, tourists and users can obtain valuable information from others' experiences when selecting their accommodation. Based on the research findings, the smart web scraping method enables the analysis of large and valuable datasets and can aid in strategic decision-making in the tourism services sector. The findings are presented using descriptive statistics and analytical tests, including t-tests and ANOVA, to assess the mean differences in user reviews across various hotel categories. The study indicates that the average ratings for hotel amenities, room prices, room quality, hotel location, and health protocols are generally above the expected average, reflecting a generally positive perception of Tehran's hotels
Extended Abstract
Introduction
In the age of digital information and online platforms, the tourism and hospitality industry has witnessed significant changes in how customers share their opinions and experiences. Nowadays, making informed decisions about travel, destinations, and accommodations heavily relies on online reviews by tourists. These reviews have become essential for travelers seeking the best travel and accommodation experiences. The manual analysis becomes impractical because of the importance of these opinions and experiences in tourists’ decision-making processes and the vast volume of reviews. As a result, smart web scraping methods have been employed on online platforms for tourism accommodations. Web scraping is developing a computer program to automatically download, analyze, and organize data from web pages, making it highly practical for extracting data from multiple pages simultaneously. The abundance of general tourism data available online holds significant data analysis potential. However, much of this data remains unanalyzed and underutilized. By collecting and analyzing these untapped data sources, significant improvements can be made in the tourism sector of any destination.
As mentioned, the topic of online customer reviews on tourism and hospitality websites and platforms is highly significant and has been emphasized in previous studies. While this topic has been extensively addressed in international research, it has received far less attention in Iran, especially for accommodations and hotels. Therefore, this study refers to the data collection method through web mining and reviews its importance and key process stages. Furthermore, in order to demonstrate the significance of the data collected via web scraping, descriptive statistics, differences in mean values of online user reviews, and significant differences in reviews for star-rated accommodations are calculated. Additionally, the reasoning behind choosing the website “Eghamat 24” for web scraping and analysis of online user reviews is discussed.
Methodology
This study aims to uncover hidden opinions in reviews and user feedback on the accommodations of accommodations in Tehran and determine if there are differences among accommodations based on review scores. Consequently, this study involves extracting user reviews from online review platforms. The extracted reviews are analyzed and converted into documents, tables, and graphs. Further analysis focuses on uncovering hidden opinions within the reviews, and scores are calculated from the feedback. This paper examines the steps and processes involved in web scraping, which include:
Website analysis;
Web scraping;
Data extraction;
Organizing, processing, and storing the data.
The study utilizes Selenium for web scraping. Selenium is a powerful and popular tool that provides a framework for automating web browsers. Essentially, it is used for scraping dynamic web pages, allowing interaction with web pages such as clicking buttons or filling out forms and extracting data from websites. The first step is installing Selenium, for which the corresponding code is written in Python. The next step involves identifying the URLs of the profiles for each accommodation, which is done using the WebDriver command from the Selenium library. After that, the URLs are accessed and clicked automatically. The next stage involves navigating the tags and elements within the URLs using the XPath command. Finally, after identifying all the tags and classes within the website's HTML source code, a for loop is used to extract all the tags and elements across all the pages. The Pandas library in Python is employed to save the data in an Excel file.
Results and discussion
The findings of the study, based on descriptive statistics and analytical tests, show that the average ratings for various aspects of hotels, such as services, room price, quality, location, and cleanliness, are generally above the expected average, indicating an overall positive perception of hotels in Tehran. The paper highlights the potential of web scraping as a powerful technique for automatically collecting data from websites, which can significantly contribute to enhancing tourism services and making strategic decisions in the industry. By studying and analyzing these reviews, businesses can better understand tourists' needs and preferences and implement necessary improvements in accommodations. Some of the impacts of using automated web scraping for businesses and users include:
Improving service quality;
Better decision-making by users;
Marketing strategy enhancements;
Boosting business trust.
Conclusion
The impact of online reviews and customer feedback from tourists is becoming increasingly significant. Tourists trust online reviews when planning their trips, while businesses use them to create effective marketing strategies. However, analyzing individual reviews manually is impractical due to the sheer volume of available data. As a result, considerable efforts have been made in recent years to develop methods for automatic analysis and summarization of reviews. This research demonstrates how various analyses conducted on user feedback data from online platforms concerning accommodations in Tehran can help businesses and users, ultimately leading to improved business practices.
Funding
There is no funding support.
Authors’ Contribution
Authors contributed equally to the conceptualization and writing of the article. All of the authors approved the content of the manuscript and agreed on all aspects of the work declaration of competing interest none.
Conflict of Interest
Authors declared no conflict of interest.
Acknowledgments
We are grateful to all the scientific consultants of this paper.
Keywords