Hi there!
Here is the R script.
(You can find materials and scripts used in this post on my Github repo.)
Thanks for visiting my blog. I always love to hear constructive feedback. Please give your feedback in the comment section below or write to me personally here.
Nowadays the ecommerce business is becoming very
popular and trying to take a very huge market share. One of the reason why
customer likes ecommerce platform is product review feature offered by sellers.
Customer can review the product that they have purchased. It will help both the
seller as well as other customers.
But because of the competition between sellers,
manufacturers, we come across many spam reviews. Paid reviewers will post
biased reviews, duplicate reviews or reviews which are irrelevant to the
product.
It becomes a very hectic process for the seller
to filter those reviews. So let us try to find a solution for this using R
language.
Logic
used:
In this example, I have taken reviews of an
iPhone in Amzon website.
Step 1:
We need to have a list of keywords. This is the
keywords which we will expect in the reviews. As we are doing review analysis
of an iPhone, our keyword list will contain keys like, "camera"
"battery" "life"
"screen"
"heat" etc.
Step 2:
In the second step, we should find how many times
these keys are repeated in those reviews build a matrix to store them.
If a review doesn’t contain any of the keyword,
then that review is possibly a spam. That review is not useful for the seller
or the customer. If a review contains many of the keywords, that review should
be considered first. That reviewer might be talking about some serious issue
with the product.
Step 3:
Depending on the number of keywords found,
calculate their relevance score and sort them.
Step 4:
In the final step we need to eliminate duplicate
reviews. I have used selection
sort method to compare reviews.
Now let us get into coding.
Here is the R script.
(You can find materials and scripts used in this post on my Github repo.)
#cleanup the work space
rm(list = setdiff(ls(), lsf.str()))
library(stringr)
#################################################################################
#read important keywords
#################################################################################
keywords = scan('KeyWords.txt',
what='character', comment.char=';',sep = "\n")
#################################################################################
#read the data to be valuated, review.txt contains 11 review, each separated by new line character
#################################################################################
reviews <- scan('review.txt',
what='character', comment.char=';',sep = "\n")
#################################################################################
#score it and compare
#################################################################################
findScore <- function(review,k) {
keyLength <- length(keywords)
matScore <- c()
tDF <- c()
for(i in 1:keyLength) {
tDF <- c(k[i],sum(str_count(review,k[i])))
matScore <- rbind(matScore,tDF)
}
return(matScore)
}
reviewLength <- length(reviews)
score <- c()
score <- keywords
for (i in 1:reviewLength) {
tScore <-c()
tScore <- findScore(reviews[i],keywords)
score <- cbind(score,tScore[,2])
}
View(score)
#################################################################################
#function to calculate relevance of reviews
#################################################################################
findRel <- function(reviewScore) {
totalScore <- 0
keyLength <- length(keywords)
for(i in 1:keyLength) {
totalScore <- as.numeric(reviewScore[i])+totalScore
}
return(totalScore)
}
#find irrelevant reviews
totalScoreR <-c()
for (i in 2:dim(score)[2]) {
totalScoreR[i-1] <- findRel(score[,i])
}
findIfRel <- function (totalScoreR) {
for (i in 1:length(totalScoreR)) {
if(as.numeric(totalScoreR[i]==0))
cat("Review:",i,"is irrelevant\n")
else
cat("Number of keywords found in review:",i,"is ", totalScoreR[i],"\n")
}
}
#call the above function to find relevance of reviews
findIfRel(totalScoreR)
#################################################################################
#sort reviews according to their importance
#################################################################################
#high value review
highValueR <- c()
highValueR <- cbind(totalScoreR,c(1:length(totalScoreR)))
highValueR <- data.frame(highValueR)
highValueR <- highValueR[order(highValueR$totalScoreR, decreasing = TRUE),]
print("Reviews in decresing order of importance:")
for (i in 1:dim(highValueR)[1]) {
cat("Rank ",i,":\n")
cat(reviews[highValueR[i,2]])
cat("\n##########################################\n")
}
#################################################################################
#find similar reviews
#################################################################################
for (i in 1:length(reviews)) {
for (j in i:length(reviews)) {
if(!i==j)
if(identical(reviews[i],reviews[j]))
cat("\nReview ",i," and review ",j," are same\n")
}
}
Thanks for visiting my blog. I always love to hear constructive feedback. Please give your feedback in the comment section below or write to me personally here.
No comments:
Post a Comment