a large scale hotel arabic-reviews dataset

rehouma, cherif; loubiri, mohammed elhabib

a large scale hotel arabic-reviews dataset

Files

a large scale hotel arabic-reviews dataset.pdf (3.75 MB)

Date

2021-06-21

Authors

rehouma, cherif

loubiri, mohammed elhabib

Publisher

universty of elouedجامعة الوادي

Abstract

"تغطي معالجة اللغة الطبيعية العديد من الدراسات وتعتبر السبب الرئيسي لتطوير التقنيات لفهم السلوك البشري. يمكن استخدام معالجة اللغة الطبيعية لحل مشكلات مثل الانتحال الكشف واستخراج الكلمات والمعلومات من النصوص وهي كذلك المستخدمة في الترجمة الآلية وتصنيف النص. اللغة العربية يعاني من نقص مجموعات البيانات الكبيرة المتاحة للتعلم الآلي. في هذا العمل ، نقدم لاشار (مجموعة بيانات فندق كبير الحجم - مراجعات باللغة العربية) ، أكبر مراجعات الفنادق في مجموعة البيانات العربية لتحليل المشاعر الشخصية وتطبيقات لغة الآلة. تتألف لاشار من 1،604،762 فندقاً المراجعات التي تم جمعها من موقع Booking.com الإلكتروني باستخدام سكرابيل الويب ، كل سجل يحتوي على نص مراجعة إيجابي أو سلبي باللغة العربية تقييم المراجع على مقياس من 1 إلى 10 نجوم ، والسمات الأخرى حول فندق / مراجع. استخدمنا أربعة مصنّفات مشاعر معروفة لفحص صحة مجموعة البيانات وكفاءتها. نقوم باختبار محللي المشاعر من أجل القطبية التصنيفات. التزامنا الأساسي هو جعل مجموعة البيانات المعيارية هذه متاح ومفتوح لمجتمع أبحاث اللغة العربية.""Natural language processing covers many studies and is considered the main reason for advancing techniques for understanding human behavior. Natural language processing can be used to solve problems such as plagiarism detection, extracting words and information from texts, and it is also used in machine translation and text classification. The Arabic language suffers from the lack of available large datasets for machine learning . In this work, we introduce LASHAR (A Large Scale Hotel Arabic-Reviews Dataset), the largest Hotel Reviews in Arabic Dataset for subjective sentiment analysis and machine language applications. LASHAR comprises of 1,604,762 hotel reviews collected from the Booking.com website using web scrapy, Each record contains positive or negative review text in the Arabic language, the reviewer’s rating on a scale of 1 to 10 stars, and other attributes about the hotel/reviewer. We used four well-known sentiment classifiers to examine the dataset’s validity and efficiency. We test the sentiment analyzers for polarity classifications. Our primary commitment is to make this benchmark data set available and open to the Arabic language research community."

Description

mémoire master informatique

Keywords

معالجة اللغة الطبيعية , تحليل المشاعر, تجريف علي شبكة الانترنات, Natural Language Processing, sentiment analyzers, web scrapy

URI

https://dspace.univ-eloued.dz/handle/123456789/9750

Collections

department of computer science_master

Full item page

a large scale hotel arabic-reviews dataset

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections