Assignment - Louelle Teo

ISSS608 Assignment - Louelle Teo

Louelle TEO Fengmin
03-30-2021

Introduction

Current Airbnb analytics (https://www.airdna.co/) in the market require a subscription, and these data, though free to access, does not become easily accessible to the masses who do not have data analytics experience.

Therefore through this assignment, we aim to create a Shiny visual, where users are able to conduct Exploratory Data Analysis on their desired location and variable, to form a perspective on the current Airbnb market.

Next, the Shiny will also contain Cluster Analysis, Geospatial Analysis, Text Analysis and Multivariate Linear Regression, so that users may have an informed decision on how to price their property, and ensure a good description write up to attract customers.


Sketch of Proposed Visualisation

Sketch for Proposed Visualisation

Figure 1: Sketch for Proposed Visualisation

Sketch for Proposed Visualisation

Figure 2: Sketch for Proposed Visualisation

Sketch for Proposed Visualisation

Figure 3: Sketch for Proposed Visualisation

Sketch for Proposed Visualisation

Figure 4: Sketch for Proposed Visualisation

Sketch for Proposed Visualisation

Figure 5: Sketch for Proposed Visualisation

Sketch for Proposed Visualisation

Figure 6: Sketch for Proposed Visualisation

Installing and launching R packages

These are the list of packages that will be used for this assignment, in RStudio.

packages <- c("tidyverse", "plotly", "purrr", "dplyr", "gdata", "corrplot", "MASS", "biganalytics", 
              "parallelPlot", "factoextra", "gridExtra", "cluster", "forcats")

for (p in packages){
  if (!require(p,character.only=T)){
    install.packages(p)
  }
  library(p, character.only=T)
}

Datasource

The data set used in this assignment is found in website Inside Airbnb http://insideairbnb.com/get-the-data.html It is a website that obtains data that is sourced from publicly available information from the Airbnb site. For this assignment, we will be concentrating on the country Australia, in the region Victoria.

Loading Dataset onto R

Both the Airbnb data set for Australia and Airbnb data set for the State of Victoria will be loaded.

airbnb <- read_csv("data/Airbnb.csv")
airbnb.vic <- read_csv("data/Airbnb_victoria.csv")
str(airbnb, nchar.max=20)
spec_tbl_df [153,914 x 65] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ X1                                          : num [1:153914] 1 2 3 4 5 6 7 8 9 10 ...
 $ id                                          : num [1:153914] 46142| __truncated__ ...
 $ name                                        : chr [1:153914] "The"| __truncated__ "'We"| __truncated__ "THE"| __truncated__ "Kin"| __truncated__ ...
 $ description                                 : chr [1:153914] "Thi"| __truncated__ "Tre"| __truncated__ "Bou"| __truncated__ "Mas"| __truncated__ ...
 $ neighborhood_overview                       : chr [1:153914] "Ora"| __truncated__ "'We"| __truncated__ NA "Very friendly." ...
 $ host_id                                     : num [1:153914] 5.32e| __truncated__ ...
 $ host_since                                  : Date[1:153914], format: "2016-01-04" ...
 $ host_location                               : chr [1:153914] "Ora"| __truncated__ "AU" "Par"| __truncated__ "Par"| __truncated__ ...
 $ host_about                                  : chr [1:153914] "Hel"| __truncated__ NA "\r\"| __truncated__ "Eas"| __truncated__ ...
 $ host_response_time                          : chr [1:153914] "within an hour" "within a day" "within an hour" "within an hour" ...
 $ host_response_rate                          : num [1:153914] 1 1 1| __truncated__ ...
 $ host_acceptance_rate                        : num [1:153914] 1 0.9| __truncated__ ...
 $ host_is_superhost                           : logi [1:153914] TRUE | __truncated__ ...
 $ host_listings_count                         : num [1:153914] 68 0 | __truncated__ ...
 $ host_total_listings_count                   : num [1:153914] 68 0 | __truncated__ ...
 $ host_verifications                          : chr [1:153914] "['e"| __truncated__ "['e"| __truncated__ "['e"| __truncated__ "['e"| __truncated__ ...
 $ host_has_profile_pic                        : logi [1:153914] TRUE | __truncated__ ...
 $ host_identity_verified                      : logi [1:153914] TRUE | __truncated__ ...
 $ neighbourhood                               : chr [1:153914] "Ora"| __truncated__ "Ora"| __truncated__ NA "Par"| __truncated__ ...
 $ latitude                                    : num [1:153914] -33.3| __truncated__ ...
 $ longitude                                   : num [1:153914] 149 149 148 151 151 ...
 $ property_type                               : chr [1:153914] "Entire house" "Entire house" "Entire house" "Sha"| __truncated__ ...
 $ room_type                                   : chr [1:153914] "Entire home/apt" "Entire home/apt" "Entire home/apt" "Shared room" ...
 $ accommodates                                : num [1:153914] 4 12 5 4 4 7 2 2 4 4 ...
 $ bathrooms_text                              : chr [1:153914] "1 bath" "3.5 baths" "2 baths" "1 shared bath" ...
 $ bedrooms                                    : num [1:153914] 2 5 3 1 3 4 1 1 2 2 ...
 $ beds                                        : num [1:153914] 2 8 3 2 3 6 1 1 2 2 ...
 $ amenities                                   : chr [1:153914] "[\""| __truncated__ "[\""| __truncated__ "[\""| __truncated__ "[\""| __truncated__ ...
 $ price                                       : num [1:153914] 235 7| __truncated__ ...
 $ minimum_nights                              : num [1:153914] 2 2 2| __truncated__ ...
 $ maximum_nights                              : num [1:153914] 365 1| __truncated__ ...
 $ minimum_minimum_nights                      : num [1:153914] 2 2 2| __truncated__ ...
 $ maximum_minimum_nights                      : num [1:153914] 5 2 2| __truncated__ ...
 $ minimum_maximum_nights                      : num [1:153914] 365 1| __truncated__ ...
 $ maximum_maximum_nights                      : num [1:153914] 365 1| __truncated__ ...
 $ minimum_nights_avg_ntm                      : num [1:153914] 2.6 2| __truncated__ ...
 $ maximum_nights_avg_ntm                      : num [1:153914] 365 1| __truncated__ ...
 $ has_availability                            : logi [1:153914] TRUE | __truncated__ ...
 $ availability_30                             : num [1:153914] 22 18| __truncated__ ...
 $ availability_60                             : num [1:153914] 51 40| __truncated__ ...
 $ availability_90                             : num [1:153914] 74 54| __truncated__ ...
 $ availability_365                            : num [1:153914] 349 8| __truncated__ ...
 $ number_of_reviews                           : num [1:153914] 15 15| __truncated__ ...
 $ number_of_reviews_ltm                       : num [1:153914] 15 15| __truncated__ ...
 $ number_of_reviews_l30d                      : num [1:153914] 7 1 2 1 0 0 0 0 0 0 ...
 $ first_review                                : Date[1:153914], format: "2020-11-07" ...
 $ last_review                                 : Date[1:153914], format: "2021-01-21" ...
 $ review_scores_rating                        : num [1:153914] 96 97| __truncated__ ...
 $ review_scores_accuracy                      : num [1:153914] 10 10| __truncated__ ...
 $ review_scores_cleanliness                   : num [1:153914] 10 10| __truncated__ ...
 $ review_scores_checkin                       : num [1:153914] 10 10| __truncated__ ...
 $ review_scores_communication                 : num [1:153914] 10 10| __truncated__ ...
 $ review_scores_location                      : num [1:153914] 10 10| __truncated__ ...
 $ review_scores_value                         : num [1:153914] 10 10| __truncated__ ...
 $ license                                     : chr [1:153914] NA NA NA NA ...
 $ instant_bookable                            : logi [1:153914] TRUE | __truncated__ ...
 $ calculated_host_listings_count              : num [1:153914] 68 1 | __truncated__ ...
 $ calculated_host_listings_count_entire_homes : num [1:153914] 68 1 | __truncated__ ...
 $ calculated_host_listings_count_private_rooms: num [1:153914] 0 0 0 1 0 0 1 1 0 0 ...
 $ calculated_host_listings_count_shared_rooms : num [1:153914] 0 0 0 1 0 0 0 0 0 0 ...
 $ region_id                                   : num [1:153914] 16150| __truncated__ ...
 $ region_name                                 : chr [1:153914] "Orange" "Orange" "Parkes" "Parramatta" ...
 $ region_parent_id                            : num [1:153914] 1 1 1 1 1 1 1 1 1 1 ...
 $ region_parent_name                          : chr [1:153914] "New South Wales" "New South Wales" "New South Wales" "New South Wales" ...
 $ reviews_per_month                           : num [1:153914] 5.29 | __truncated__ ...
 - attr(*, "spec")=
  .. cols(
  ..   X1 = col_double(),
  ..   id = col_double(),
  ..   name = col_character(),
  ..   description = col_character(),
  ..   neighborhood_overview = col_character(),
  ..   host_id = col_double(),
  ..   host_since = col_date(format = ""),
  ..   host_location = col_character(),
  ..   host_about = col_character(),
  ..   host_response_time = col_character(),
  ..   host_response_rate = col_double(),
  ..   host_acceptance_rate = col_double(),
  ..   host_is_superhost = col_logical(),
  ..   host_listings_count = col_double(),
  ..   host_total_listings_count = col_double(),
  ..   host_verifications = col_character(),
  ..   host_has_profile_pic = col_logical(),
  ..   host_identity_verified = col_logical(),
  ..   neighbourhood = col_character(),
  ..   latitude = col_double(),
  ..   longitude = col_double(),
  ..   property_type = col_character(),
  ..   room_type = col_character(),
  ..   accommodates = col_double(),
  ..   bathrooms_text = col_character(),
  ..   bedrooms = col_double(),
  ..   beds = col_double(),
  ..   amenities = col_character(),
  ..   price = col_double(),
  ..   minimum_nights = col_double(),
  ..   maximum_nights = col_double(),
  ..   minimum_minimum_nights = col_double(),
  ..   maximum_minimum_nights = col_double(),
  ..   minimum_maximum_nights = col_double(),
  ..   maximum_maximum_nights = col_double(),
  ..   minimum_nights_avg_ntm = col_double(),
  ..   maximum_nights_avg_ntm = col_double(),
  ..   has_availability = col_logical(),
  ..   availability_30 = col_double(),
  ..   availability_60 = col_double(),
  ..   availability_90 = col_double(),
  ..   availability_365 = col_double(),
  ..   number_of_reviews = col_double(),
  ..   number_of_reviews_ltm = col_double(),
  ..   number_of_reviews_l30d = col_double(),
  ..   first_review = col_date(format = ""),
  ..   last_review = col_date(format = ""),
  ..   review_scores_rating = col_double(),
  ..   review_scores_accuracy = col_double(),
  ..   review_scores_cleanliness = col_double(),
  ..   review_scores_checkin = col_double(),
  ..   review_scores_communication = col_double(),
  ..   review_scores_location = col_double(),
  ..   review_scores_value = col_double(),
  ..   license = col_character(),
  ..   instant_bookable = col_logical(),
  ..   calculated_host_listings_count = col_double(),
  ..   calculated_host_listings_count_entire_homes = col_double(),
  ..   calculated_host_listings_count_private_rooms = col_double(),
  ..   calculated_host_listings_count_shared_rooms = col_double(),
  ..   region_id = col_double(),
  ..   region_name = col_character(),
  ..   region_parent_id = col_double(),
  ..   region_parent_name = col_character(),
  ..   reviews_per_month = col_double()
  .. )

Exploratory Data Analysis

First, we will create some simple bar charts, boxplots, scatterplots and density plots to be able to explore the different variables in the data set. It will allow us to formulate hypothesis and explore different statistics models that could be developed after.

Below are a few plots to be incorporated into the Shiny App. Users will be able to toggle different variables for the different charts to explore the data.

Bar chart allows us to visualise the Top States and top Local Government Area for the count of host and listings.

Number of Hosts per State

unique_host_id_aus <- airbnb %>%
  dplyr::select(host_id, region_parent_name) %>%
  unique() %>%
  mutate(hostcount=1)%>%
  dplyr::select(-host_id)%>%
  group_by(region_parent_name)%>%
  count(hostcount, name="number.of.host")%>%
  ungroup() %>% 
  dplyr::select(-hostcount)%>%
  arrange(desc(number.of.host))
  
b<-ggplot(unique_host_id_aus,aes(x=(reorder(region_parent_name,-number.of.host)),y=number.of.host))+
  geom_col() +
  labs(title= "Number of Hosts per State", x="State", y="Number of Hosts")+
  theme(legend.position="bottom", axis.text.x = element_text(angle = 90))

Number of Hosts in the top 10 Local Government Area (LGA)

unique_host_id <- airbnb.vic %>%
  dplyr::select(host_id, region_name) %>%
  unique() %>%
  mutate(hostcount=1)%>%
  dplyr::select(-host_id)%>%
  group_by(region_name)%>%
  count(hostcount, name="number.of.host")%>%
  ungroup() %>% 
  dplyr::select(-hostcount)%>%
  arrange(desc(number.of.host))%>%
  slice_head(n=10)
  
b1<-ggplot(unique_host_id,aes(x=(reorder(region_name,-number.of.host)),y=number.of.host))+
  geom_col() +
  labs(title= "Number of Hosts in Top 10 LGA", x="LGA", y="Number of Hosts")+
  theme(legend.position="bottom", axis.text.x = element_text(angle = 90))

Number of Airbnb Listings per State

unique_id_aus <- airbnb %>%
  dplyr::select(id, region_parent_name) %>%
  unique() %>%
  mutate(idcount=1)%>%
  dplyr::select(-id)%>%
  group_by(region_parent_name)%>%
  count(idcount, name="number.of.listings")%>%
  ungroup() %>% 
  dplyr::select(-idcount)%>%
  arrange(desc(number.of.listings))
  
b2<-ggplot(unique_id_aus,aes(x=(reorder(region_parent_name,-number.of.listings)),y=number.of.listings))+
  geom_col() +
  labs(title= "Number of Listings per State", x="State", y="Number of Airbnb Listings")+
  theme(legend.position="bottom", axis.text.x = element_text(angle = 90))

Top 10 LGA - Number of Airbnb Listings

unique_id <- airbnb.vic %>%
  dplyr::select(id, region_name) %>%
  unique() %>%
  mutate(idcount=1)%>%
  dplyr::select(-id)%>%
  group_by(region_name)%>%
  count(idcount, name="number.of.listings")%>%
  ungroup() %>% 
  dplyr::select(-idcount)%>%
  arrange(desc(number.of.listings))%>%
  slice_head(n=10)
  
b3<-ggplot(unique_id,aes(x=(reorder(region_name,-number.of.listings)),y=number.of.listings))+
  geom_col() +
  labs(title= "Number of Listings in the Top 10 LGA", x="LGA", y="Number of Airbnb Listings")+
  theme(legend.position="bottom", axis.text.x = element_text(angle = 90))

grid.arrange(b, b1, b2, b3, nrow = 2)

Boxplots of Prices in the Top 10 LGA (Airbnb Listings)

We will plot Box plots to compare the prices in different states. We note that Morrington Penninsula has the most expensive prices in Australia.

Next, we will plot different boxplots for the State of Victoria, and compare prices versus type of property type. We note that Entire House in Surf Coast is the priciest in Victoria.

#boxplots

cities<-unique_id$region_name

eda_boxplot<-airbnb.vic %>%
  filter(region_name %in% cities)%>%
  dplyr::select(region_name,price)
  

ggplot(eda_boxplot, aes(x=region_name,y=price))+
  geom_boxplot()+
  theme(legend.position="bottom", axis.text.x = element_text(angle = 90))+
  labs(title= "Top 10 LGA - Boxplot of Prices", x="", y="Prices")

Top 10 LGA - Boxplot of Prices per Property Type

#boxplots of property types and prices

cities<-unique_host_id$region_name

property.type.focus <- airbnb.vic%>%
  dplyr::select(property_type)%>%
  group_by(property_type)%>%
  count(property_type,name="total.no.property")%>%
  ungroup()%>%
  arrange(-total.no.property)%>%
  slice_head(n=5)

eda_boxplot_propertytype<-airbnb.vic %>%
  filter(region_name %in% cities)%>%
  filter(property_type %in% property.type.focus$property_type)%>%
  dplyr::select(region_name,price,property_type)
  

ggplot(eda_boxplot_propertytype, aes(x=property_type,y=price))+
  geom_boxplot()+
  theme(legend.position="bottom", axis.text.x = element_text(angle = 90))+
  labs(title= "Boxplot of Prices by Property Type", x="", y="Prices")+
  facet_wrap(~region_name)

Scatter plot

A Density plot is drawn up to identify where most of the reviews lie. It gives us a representation of the distribution of a numeric variable.

Users in the shiny will be able to toggle different variables such as prices, beds, bedrooms etc.

A Bivariate plot is plotted to compare the relationship between 2 variables. We have filtered it down towards review scores above 80, as the density plot has indicated that most of the reviews are in that particular range.

Users in the shiny will be able to toggle the range of the x-axis using a slider. They will also be able to choose the different variables of x and y axis in the Shiny.

Density Plot of Review Scores for Entire House

#density plot

unique_id2 <- airbnb.vic %>%
  dplyr::select(id, region_name) %>%
  unique() %>%
  mutate(idcount=1)%>%
  dplyr::select(-id)%>%
  group_by(region_name)%>%
  count(idcount, name="number.of.listings")%>%
  ungroup() %>% 
  dplyr::select(-idcount)%>%
  arrange(desc(number.of.listings))%>%
  slice_head(n=5)

cities2<-unique_id2$region_name

densityplot<- airbnb.vic%>%
  filter(region_name %in% cities2)%>%
  filter(property_type == c("Entire house","Entire apartment")) %>%
  dplyr::select(price,review_scores_rating,region_name,property_type)

d1<-ggplot(densityplot,aes(review_scores_rating))+
  geom_density()+
  labs(title= "Density Plot of Review Scores for Entire House", x="Review Scores", y="")

Bivariate Plot - Review Scores vs Prices for Entire House

#bivariate prices vs reviews

bivariateplot<- airbnb.vic%>%
  filter(region_name %in% cities2)%>%
  filter(property_type == c("Entire house","Entire apartment")) %>%
  dplyr::select(price,review_scores_rating,region_name,property_type)%>%
  filter(review_scores_rating>80)

d2<-ggplot(bivariateplot,aes(review_scores_rating,price,color=region_name))+
  geom_jitter()+
  labs(title= "Review Scores vs Prices for Entire House", x="Review Scores", y="Prices")+ 
  geom_smooth(method='lm', formula= y~x, se = FALSE)+ 
  theme(legend.position="bottom")

grid.arrange(d1, d2, nrow = 2)

Correlation Plot

A correlation plot is drawn to choose variables with low correlation for the multivariate linear regression. In this assignment, we have used method - ellipse, and order hclust.

Users in the Shiny will be able to use different Methods and Orders.

VISUALIZATION METHODS: There are seven visualization methods (parameter method) in corrplot package, named “circle”, “square”, “ellipse”, “number”, “shade”, “color”, “pie”.

REORDER A CORRELATION MATRIX: The correlation matrix can be reordered according to the correlation coefficient. This is important to identify the hidden structure and pattern in the matrix. There are four methods in corrplot (parameter order), named “AOE”, “FPC”, “hclust”, “alphabet”.

#correlation plot

airbnb.cor <- dplyr::select(airbnb.vic,c(11, 12, 15, 24, 26, 27, 29,30,31, 43, 48, 49, 50, 51, 52, 53, 54, 61,65))
airbnb.cor2 <- cor(airbnb.cor,use = "complete.obs")
corrplot(airbnb.cor2,
         method = "ellipse",
         type="lower",
         diag = FALSE,
         tl.col = "black",
         order = "hclust")

Multivariate Linear Regression

A Multivariate Linear Regression is created with Backwards, forwards and both stepwise regression. Users will be able to choose their variables from the correlation plot, to run the regression.

In this assignment, we will run the full variable set as an example, but in the Shiny, users will be able to choose the variables that will yield the best results. Target variables will be price and review_scores_ratings.

airbnb.mlr <- dplyr::select(airbnb.vic,c(11, 12, 15, 24, 26, 27, 29,30,31, 43, 48, 49, 50, 51, 52, 53, 54, 61,65))
variablesxMlr<-c("price", "host_response_rate", "host_acceptance_rate", "host_total_listings_count", "accommodates", "bedrooms", "beds","minimum_nights","maximum_nights", "number_of_reviews", "review_scores_rating", "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin", "review_scores_communication", "review_scores_location", "review_scores_value", "region_id","reviews_per_month")

Rdata = airbnb.mlr[,variablesxMlr]
  Rdata<-na.omit(Rdata)
    fit1<-lm(price ~.,data=Rdata)
    fit2<-lm(price ~1,data=Rdata)

MLR - Backward Stepwise

step<- stepAIC(fit1,direction = "backward")
Start:  AIC=218821.3
price ~ host_response_rate + host_acceptance_rate + host_total_listings_count + 
    accommodates + bedrooms + beds + minimum_nights + maximum_nights + 
    number_of_reviews + review_scores_rating + review_scores_accuracy + 
    review_scores_cleanliness + review_scores_checkin + review_scores_communication + 
    review_scores_location + review_scores_value + region_id + 
    reviews_per_month

                              Df Sum of Sq       RSS    AIC
- minimum_nights               1      7951 339937018 218820
<none>                                     339929067 218821
- review_scores_accuracy       1     46252 339975319 218822
- review_scores_checkin        1     96761 340025828 218826
- beds                         1     97120 340026187 218826
- host_response_rate           1    103138 340032204 218826
- reviews_per_month            1    116942 340046009 218827
- review_scores_communication  1    250031 340179098 218836
- maximum_nights               1    256378 340185445 218836
- host_acceptance_rate         1    342618 340271684 218842
- review_scores_cleanliness    1   1049041 340978108 218889
- review_scores_location       1   1112876 341041942 218894
- number_of_reviews            1   2151989 342081056 218963
- region_id                    1   2317804 342246871 218974
- host_total_listings_count    1   6121217 346050284 219226
- review_scores_rating         1   6448712 346377779 219247
- accommodates                 1   8420917 348349984 219376
- review_scores_value          1  13509722 353438789 219706
- bedrooms                     1  19337162 359266229 220079

Step:  AIC=218819.8
price ~ host_response_rate + host_acceptance_rate + host_total_listings_count + 
    accommodates + bedrooms + beds + maximum_nights + number_of_reviews + 
    review_scores_rating + review_scores_accuracy + review_scores_cleanliness + 
    review_scores_checkin + review_scores_communication + review_scores_location + 
    review_scores_value + region_id + reviews_per_month

                              Df Sum of Sq       RSS    AIC
<none>                                     339937018 218820
- review_scores_accuracy       1     46278 339983296 218821
- beds                         1     96408 340033426 218824
- review_scores_checkin        1     97209 340034227 218824
- host_response_rate           1    104641 340041658 218825
- reviews_per_month            1    115519 340052537 218826
- review_scores_communication  1    249813 340186831 218835
- maximum_nights               1    257573 340194591 218835
- host_acceptance_rate         1    342192 340279210 218841
- review_scores_cleanliness    1   1050825 340987843 218888
- review_scores_location       1   1113323 341050341 218892
- number_of_reviews            1   2156065 342093083 218962
- region_id                    1   2315221 342252239 218972
- host_total_listings_count    1   6121345 346058363 219224
- review_scores_rating         1   6448419 346385437 219246
- accommodates                 1   8416540 348353558 219375
- review_scores_value          1  13513474 353450492 219705
- bedrooms                     1  19341620 359278638 220077
summary(step)

Call:
lm(formula = price ~ host_response_rate + host_acceptance_rate + 
    host_total_listings_count + accommodates + bedrooms + beds + 
    maximum_nights + number_of_reviews + review_scores_rating + 
    review_scores_accuracy + review_scores_cleanliness + review_scores_checkin + 
    review_scores_communication + review_scores_location + review_scores_value + 
    region_id + reviews_per_month, data = Rdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-1157.61   -72.83   -17.84    51.80   746.72 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)
(Intercept)                 -2.545e+02  1.934e+01 -13.161  < 2e-16
host_response_rate           1.515e+01  5.724e+00   2.646  0.00815
host_acceptance_rate        -2.037e+01  4.258e+00  -4.785 1.72e-06
host_total_listings_count    6.280e-01  3.103e-02  20.238  < 2e-16
accommodates                 1.651e+01  6.955e-01  23.731  < 2e-16
bedrooms                     5.049e+01  1.404e+00  35.974  < 2e-16
beds                        -1.765e+00  6.948e-01  -2.540  0.01110
maximum_nights              -6.413e-03  1.545e-03  -4.151 3.32e-05
number_of_reviews           -1.986e-01  1.654e-02 -12.011  < 2e-16
review_scores_rating         4.430e+00  2.133e-01  20.772  < 2e-16
review_scores_accuracy       3.240e+00  1.841e+00   1.760  0.07848
review_scores_cleanliness    1.145e+01  1.366e+00   8.385  < 2e-16
review_scores_checkin       -4.763e+00  1.868e+00  -2.550  0.01077
review_scores_communication -7.742e+00  1.894e+00  -4.088 4.36e-05
review_scores_location       1.590e+01  1.842e+00   8.631  < 2e-16
review_scores_value         -4.259e+01  1.416e+00 -30.070  < 2e-16
region_id                    4.728e-03  3.799e-04  12.446  < 2e-16
reviews_per_month           -1.923e+00  6.917e-01  -2.780  0.00544
                               
(Intercept)                 ***
host_response_rate          ** 
host_acceptance_rate        ***
host_total_listings_count   ***
accommodates                ***
bedrooms                    ***
beds                        *  
maximum_nights              ***
number_of_reviews           ***
review_scores_rating        ***
review_scores_accuracy      .  
review_scores_cleanliness   ***
review_scores_checkin       *  
review_scores_communication ***
review_scores_location      ***
review_scores_value         ***
region_id                   ***
reviews_per_month           ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 122.3 on 22745 degrees of freedom
Multiple R-squared:  0.4963,    Adjusted R-squared:  0.4959 
F-statistic:  1318 on 17 and 22745 DF,  p-value: < 2.2e-16

MLR - Forward Stepwise

stepforward<- stepAIC(fit2,direction = "forward",scope = list(upper = fit1,lower = fit2))
Start:  AIC=234395.5
price ~ 1

                              Df Sum of Sq       RSS    AIC
+ bedrooms                     1 284126261 390735948 221958
+ accommodates                 1 273094565 401767644 222592
+ beds                         1 208303131 466559078 225995
+ host_total_listings_count    1  33441977 641420232 233241
+ number_of_reviews            1  23157895 651704314 233603
+ reviews_per_month            1  16271888 658590320 233842
+ review_scores_value          1   6986693 667875516 234161
+ review_scores_rating         1   4083182 670779027 234259
+ host_acceptance_rate         1   3759154 671103055 234270
+ review_scores_location       1   2111619 672750590 234326
+ review_scores_checkin        1   1224373 673637835 234356
+ maximum_nights               1   1004572 673857637 234364
+ review_scores_accuracy       1    961438 673900771 234365
+ review_scores_cleanliness    1    409167 674453042 234384
+ review_scores_communication  1     62074 674800135 234395
<none>                                     674862209 234396
+ region_id                    1     45125 674817084 234396
+ minimum_nights               1     40986 674821223 234396
+ host_response_rate           1       139 674862070 234398

Step:  AIC=221958.1
price ~ bedrooms

                              Df Sum of Sq       RSS    AIC
+ accommodates                 1  12120749 378615199 221243
+ host_total_listings_count    1  10967313 379768634 221312
+ number_of_reviews            1   6567538 384168410 221574
+ reviews_per_month            1   4355945 386380003 221705
+ review_scores_value          1   3550374 387185573 221752
+ beds                         1   2459251 388276697 221816
+ review_scores_rating         1   2283451 388452496 221827
+ region_id                    1   2110659 388625289 221837
+ review_scores_cleanliness    1   1686966 389048982 221862
+ review_scores_location       1   1003285 389732663 221902
+ review_scores_accuracy       1    554777 390181171 221928
+ host_acceptance_rate         1    509804 390226144 221930
+ maximum_nights               1    294192 390441756 221943
+ review_scores_checkin        1     73493 390662454 221956
<none>                                     390735948 221958
+ host_response_rate           1      7026 390728922 221960
+ minimum_nights               1      4632 390731316 221960
+ review_scores_communication  1       832 390735116 221960

Step:  AIC=221242.8
price ~ bedrooms + accommodates

                              Df Sum of Sq       RSS    AIC
+ host_total_listings_count    1   9626601 368988598 220659
+ number_of_reviews            1   6789543 371825656 220833
+ reviews_per_month            1   4442582 374172616 220976
+ review_scores_value          1   3102304 375512895 221057
+ review_scores_rating         1   2643961 375971238 221085
+ region_id                    1   2272069 376343130 221108
+ review_scores_cleanliness    1   2076065 376539134 221120
+ review_scores_location       1    808935 377806264 221196
+ host_acceptance_rate         1    671329 377943869 221204
+ review_scores_accuracy       1    645146 377970053 221206
+ maximum_nights               1    389795 378225404 221221
+ beds                         1    344402 378270796 221224
+ review_scores_checkin        1     84924 378530275 221240
<none>                                     378615199 221243
+ host_response_rate           1     18190 378597009 221244
+ minimum_nights               1      6020 378609179 221244
+ review_scores_communication  1      1987 378613212 221245

Step:  AIC=220658.5
price ~ bedrooms + accommodates + host_total_listings_count

                              Df Sum of Sq       RSS    AIC
+ review_scores_rating         1   4878359 364110239 220358
+ number_of_reviews            1   4559863 364428735 220377
+ review_scores_cleanliness    1   3821538 365167060 220424
+ region_id                    1   2420382 366568216 220511
+ reviews_per_month            1   2417563 366571035 220511
+ review_scores_accuracy       1   1626050 367362548 220560
+ review_scores_value          1   1257781 367730817 220583
+ review_scores_location       1   1217912 367770686 220585
+ host_acceptance_rate         1    718174 368270424 220616
+ review_scores_checkin        1    577175 368411423 220625
+ review_scores_communication  1    384535 368604063 220637
+ maximum_nights               1    268477 368720121 220644
+ beds                         1     91061 368897537 220655
+ host_response_rate           1     46852 368941746 220658
<none>                                     368988598 220659
+ minimum_nights               1      9327 368979271 220660

Step:  AIC=220357.6
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating

                              Df Sum of Sq       RSS    AIC
+ review_scores_value          1  13953322 350156917 219470
+ number_of_reviews            1   5555080 358555159 220010
+ reviews_per_month            1   3257034 360853205 220155
+ region_id                    1   2567461 361542778 220198
+ review_scores_communication  1   1164143 362946097 220287
+ host_acceptance_rate         1    936948 363173291 220301
+ review_scores_checkin        1    539186 363571053 220326
+ review_scores_accuracy       1    400422 363709817 220335
+ review_scores_cleanliness    1    278024 363832215 220342
+ maximum_nights               1    176134 363934106 220349
+ beds                         1     69146 364041094 220355
<none>                                     364110239 220358
+ host_response_rate           1     29218 364081021 220358
+ minimum_nights               1      6145 364104094 220359
+ review_scores_location       1        36 364110203 220360

Step:  AIC=219470.1
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value

                              Df Sum of Sq       RSS    AIC
+ number_of_reviews            1   4416609 345740309 219183
+ reviews_per_month            1   2581771 347575146 219304
+ region_id                    1   2427897 347729021 219314
+ review_scores_cleanliness    1    945163 349211755 219411
+ host_acceptance_rate         1    817044 349339873 219419
+ review_scores_location       1    696833 349460084 219427
+ review_scores_communication  1    264200 349892718 219455
+ maximum_nights               1    246133 349910785 219456
+ review_scores_checkin        1    104527 350052390 219465
+ review_scores_accuracy       1     67549 350089368 219468
+ beds                         1     63066 350093851 219468
<none>                                     350156917 219470
+ host_response_rate           1     13477 350143440 219471
+ minimum_nights               1      4605 350152312 219472

Step:  AIC=219183.1
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews

                              Df Sum of Sq       RSS    AIC
+ region_id                    1   2320256 343420053 219032
+ review_scores_cleanliness    1   1048631 344691678 219116
+ review_scores_location       1    953667 344786642 219122
+ host_acceptance_rate         1    354845 345385464 219162
+ maximum_nights               1    279197 345461111 219167
+ reviews_per_month            1    192518 345547791 219172
+ review_scores_communication  1    181919 345558390 219173
+ review_scores_accuracy       1    143571 345596738 219176
+ beds                         1     81524 345658785 219180
+ review_scores_checkin        1     60206 345680103 219181
<none>                                     345740309 219183
+ minimum_nights               1      5949 345734360 219185
+ host_response_rate           1      2506 345737803 219185

Step:  AIC=219031.9
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id

                              Df Sum of Sq       RSS    AIC
+ review_scores_cleanliness    1   1194813 342225240 218955
+ review_scores_location       1    952976 342467076 218971
+ maximum_nights               1    298126 343121927 219014
+ host_acceptance_rate         1    246708 343173344 219018
+ review_scores_accuracy       1    167349 343252703 219023
+ review_scores_communication  1    163316 343256736 219023
+ reviews_per_month            1    157193 343262860 219023
+ beds                         1     86397 343333656 219028
+ review_scores_checkin        1     40613 343379439 219031
<none>                                     343420053 219032
+ host_response_rate           1     14774 343405279 219033
+ minimum_nights               1      9400 343410653 219033

Step:  AIC=218954.5
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness

                              Df Sum of Sq       RSS    AIC
+ review_scores_location       1    853226 341372014 218900
+ host_acceptance_rate         1    299407 341925832 218937
+ maximum_nights               1    288940 341936300 218937
+ review_scores_communication  1    189753 342035487 218944
+ reviews_per_month            1    168575 342056665 218945
+ review_scores_checkin        1     77171 342148069 218951
+ beds                         1     72895 342152345 218952
+ review_scores_accuracy       1     49331 342175908 218953
<none>                                     342225240 218955
+ minimum_nights               1      7224 342218016 218956
+ host_response_rate           1      3728 342221512 218956

Step:  AIC=218899.7
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location

                              Df Sum of Sq       RSS    AIC
+ review_scores_communication  1    423436 340948578 218873
+ host_acceptance_rate         1    328519 341043495 218880
+ maximum_nights               1    277697 341094317 218883
+ review_scores_checkin        1    263293 341108721 218884
+ reviews_per_month            1    173866 341198148 218890
+ beds                         1     81067 341290947 218896
<none>                                     341372014 218900
+ minimum_nights               1      6823 341365191 218901
+ review_scores_accuracy       1      3640 341368374 218901
+ host_response_rate           1       807 341371207 218902

Step:  AIC=218873.5
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication

                         Df Sum of Sq       RSS    AIC
+ host_acceptance_rate    1    330737 340617841 218853
+ maximum_nights          1    267338 340681240 218858
+ reviews_per_month       1    178939 340769639 218864
+ beds                    1     71143 340877435 218871
+ review_scores_checkin   1     62637 340885942 218871
+ review_scores_accuracy  1     40462 340908117 218873
<none>                                340948578 218873
+ host_response_rate      1      9299 340939280 218875
+ minimum_nights          1      7003 340941575 218875

Step:  AIC=218853.4
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate

                         Df Sum of Sq       RSS    AIC
+ maximum_nights          1    254604 340363237 218838
+ reviews_per_month       1     98662 340519179 218849
+ host_response_rate      1     93981 340523860 218849
+ beds                    1     82713 340535128 218850
+ review_scores_checkin   1     75089 340542752 218850
+ review_scores_accuracy  1     39440 340578401 218853
<none>                                340617841 218853
+ minimum_nights          1      8841 340609000 218855

Step:  AIC=218838.4
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights

                         Df Sum of Sq       RSS    AIC
+ reviews_per_month       1    100777 340262460 218834
+ host_response_rate      1     89005 340274232 218834
+ beds                    1     85507 340277730 218835
+ review_scores_checkin   1     79604 340283632 218835
+ review_scores_accuracy  1     37767 340325470 218838
<none>                                340363237 218838
+ minimum_nights          1      7623 340355613 218840

Step:  AIC=218833.6
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month

                         Df Sum of Sq       RSS    AIC
+ beds                    1     94499 340167961 218829
+ host_response_rate      1     93965 340168495 218829
+ review_scores_checkin   1     82380 340180080 218830
+ review_scores_accuracy  1     35313 340227148 218833
<none>                                340262460 218834
+ minimum_nights          1      9061 340253399 218835

Step:  AIC=218829.3
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds

                         Df Sum of Sq       RSS    AIC
+ host_response_rate      1     99926 340068035 218825
+ review_scores_checkin   1     79060 340088901 218826
+ review_scores_accuracy  1     34981 340132979 218829
<none>                                340167961 218829
+ minimum_nights          1      9886 340158075 218831

Step:  AIC=218824.6
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate

                         Df Sum of Sq       RSS    AIC
+ review_scores_checkin   1     84739 339983296 218821
+ review_scores_accuracy  1     33808 340034227 218824
<none>                                340068035 218825
+ minimum_nights          1      8395 340059640 218826

Step:  AIC=218820.9
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate + review_scores_checkin

                         Df Sum of Sq       RSS    AIC
+ review_scores_accuracy  1     46278 339937018 218820
<none>                                339983296 218821
+ minimum_nights          1      7977 339975319 218822

Step:  AIC=218819.8
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate + review_scores_checkin + 
    review_scores_accuracy

                 Df Sum of Sq       RSS    AIC
<none>                        339937018 218820
+ minimum_nights  1    7951.2 339929067 218821
summary(stepforward)

Call:
lm(formula = price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate + review_scores_checkin + 
    review_scores_accuracy, data = Rdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-1157.61   -72.83   -17.84    51.80   746.72 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)
(Intercept)                 -2.545e+02  1.934e+01 -13.161  < 2e-16
bedrooms                     5.049e+01  1.404e+00  35.974  < 2e-16
accommodates                 1.651e+01  6.955e-01  23.731  < 2e-16
host_total_listings_count    6.280e-01  3.103e-02  20.238  < 2e-16
review_scores_rating         4.430e+00  2.133e-01  20.772  < 2e-16
review_scores_value         -4.259e+01  1.416e+00 -30.070  < 2e-16
number_of_reviews           -1.986e-01  1.654e-02 -12.011  < 2e-16
region_id                    4.728e-03  3.799e-04  12.446  < 2e-16
review_scores_cleanliness    1.145e+01  1.366e+00   8.385  < 2e-16
review_scores_location       1.590e+01  1.842e+00   8.631  < 2e-16
review_scores_communication -7.742e+00  1.894e+00  -4.088 4.36e-05
host_acceptance_rate        -2.037e+01  4.258e+00  -4.785 1.72e-06
maximum_nights              -6.413e-03  1.545e-03  -4.151 3.32e-05
reviews_per_month           -1.923e+00  6.917e-01  -2.780  0.00544
beds                        -1.765e+00  6.948e-01  -2.540  0.01110
host_response_rate           1.515e+01  5.724e+00   2.646  0.00815
review_scores_checkin       -4.763e+00  1.868e+00  -2.550  0.01077
review_scores_accuracy       3.240e+00  1.841e+00   1.760  0.07848
                               
(Intercept)                 ***
bedrooms                    ***
accommodates                ***
host_total_listings_count   ***
review_scores_rating        ***
review_scores_value         ***
number_of_reviews           ***
region_id                   ***
review_scores_cleanliness   ***
review_scores_location      ***
review_scores_communication ***
host_acceptance_rate        ***
maximum_nights              ***
reviews_per_month           ** 
beds                        *  
host_response_rate          ** 
review_scores_checkin       *  
review_scores_accuracy      .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 122.3 on 22745 degrees of freedom
Multiple R-squared:  0.4963,    Adjusted R-squared:  0.4959 
F-statistic:  1318 on 17 and 22745 DF,  p-value: < 2.2e-16

MLR - Both Stepwise

stepboth<- stepAIC(fit2,direction = "both",scope = list(upper = fit1,lower = fit2))
Start:  AIC=234395.5
price ~ 1

                              Df Sum of Sq       RSS    AIC
+ bedrooms                     1 284126261 390735948 221958
+ accommodates                 1 273094565 401767644 222592
+ beds                         1 208303131 466559078 225995
+ host_total_listings_count    1  33441977 641420232 233241
+ number_of_reviews            1  23157895 651704314 233603
+ reviews_per_month            1  16271888 658590320 233842
+ review_scores_value          1   6986693 667875516 234161
+ review_scores_rating         1   4083182 670779027 234259
+ host_acceptance_rate         1   3759154 671103055 234270
+ review_scores_location       1   2111619 672750590 234326
+ review_scores_checkin        1   1224373 673637835 234356
+ maximum_nights               1   1004572 673857637 234364
+ review_scores_accuracy       1    961438 673900771 234365
+ review_scores_cleanliness    1    409167 674453042 234384
+ review_scores_communication  1     62074 674800135 234395
<none>                                     674862209 234396
+ region_id                    1     45125 674817084 234396
+ minimum_nights               1     40986 674821223 234396
+ host_response_rate           1       139 674862070 234398

Step:  AIC=221958.1
price ~ bedrooms

                              Df Sum of Sq       RSS    AIC
+ accommodates                 1  12120749 378615199 221243
+ host_total_listings_count    1  10967313 379768634 221312
+ number_of_reviews            1   6567538 384168410 221574
+ reviews_per_month            1   4355945 386380003 221705
+ review_scores_value          1   3550374 387185573 221752
+ beds                         1   2459251 388276697 221816
+ review_scores_rating         1   2283451 388452496 221827
+ region_id                    1   2110659 388625289 221837
+ review_scores_cleanliness    1   1686966 389048982 221862
+ review_scores_location       1   1003285 389732663 221902
+ review_scores_accuracy       1    554777 390181171 221928
+ host_acceptance_rate         1    509804 390226144 221930
+ maximum_nights               1    294192 390441756 221943
+ review_scores_checkin        1     73493 390662454 221956
<none>                                     390735948 221958
+ host_response_rate           1      7026 390728922 221960
+ minimum_nights               1      4632 390731316 221960
+ review_scores_communication  1       832 390735116 221960
- bedrooms                     1 284126261 674862209 234396

Step:  AIC=221242.8
price ~ bedrooms + accommodates

                              Df Sum of Sq       RSS    AIC
+ host_total_listings_count    1   9626601 368988598 220659
+ number_of_reviews            1   6789543 371825656 220833
+ reviews_per_month            1   4442582 374172616 220976
+ review_scores_value          1   3102304 375512895 221057
+ review_scores_rating         1   2643961 375971238 221085
+ region_id                    1   2272069 376343130 221108
+ review_scores_cleanliness    1   2076065 376539134 221120
+ review_scores_location       1    808935 377806264 221196
+ host_acceptance_rate         1    671329 377943869 221204
+ review_scores_accuracy       1    645146 377970053 221206
+ maximum_nights               1    389795 378225404 221221
+ beds                         1    344402 378270796 221224
+ review_scores_checkin        1     84924 378530275 221240
<none>                                     378615199 221243
+ host_response_rate           1     18190 378597009 221244
+ minimum_nights               1      6020 378609179 221244
+ review_scores_communication  1      1987 378613212 221245
- accommodates                 1  12120749 390735948 221958
- bedrooms                     1  23152445 401767644 222592

Step:  AIC=220658.5
price ~ bedrooms + accommodates + host_total_listings_count

                              Df Sum of Sq       RSS    AIC
+ review_scores_rating         1   4878359 364110239 220358
+ number_of_reviews            1   4559863 364428735 220377
+ review_scores_cleanliness    1   3821538 365167060 220424
+ region_id                    1   2420382 366568216 220511
+ reviews_per_month            1   2417563 366571035 220511
+ review_scores_accuracy       1   1626050 367362548 220560
+ review_scores_value          1   1257781 367730817 220583
+ review_scores_location       1   1217912 367770686 220585
+ host_acceptance_rate         1    718174 368270424 220616
+ review_scores_checkin        1    577175 368411423 220625
+ review_scores_communication  1    384535 368604063 220637
+ maximum_nights               1    268477 368720121 220644
+ beds                         1     91061 368897537 220655
+ host_response_rate           1     46852 368941746 220658
<none>                                     368988598 220659
+ minimum_nights               1      9327 368979271 220660
- host_total_listings_count    1   9626601 378615199 221243
- accommodates                 1  10780036 379768634 221312
- bedrooms                     1  22674737 391663335 222014

Step:  AIC=220357.6
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating

                              Df Sum of Sq       RSS    AIC
+ review_scores_value          1  13953322 350156917 219470
+ number_of_reviews            1   5555080 358555159 220010
+ reviews_per_month            1   3257034 360853205 220155
+ region_id                    1   2567461 361542778 220198
+ review_scores_communication  1   1164143 362946097 220287
+ host_acceptance_rate         1    936948 363173291 220301
+ review_scores_checkin        1    539186 363571053 220326
+ review_scores_accuracy       1    400422 363709817 220335
+ review_scores_cleanliness    1    278024 363832215 220342
+ maximum_nights               1    176134 363934106 220349
+ beds                         1     69146 364041094 220355
<none>                                     364110239 220358
+ host_response_rate           1     29218 364081021 220358
+ minimum_nights               1      6145 364104094 220359
+ review_scores_location       1        36 364110203 220360
- review_scores_rating         1   4878359 368988598 220659
- accommodates                 1  11098347 375208586 221039
- host_total_listings_count    1  11860998 375971238 221085
- bedrooms                     1  21656285 385766524 221671

Step:  AIC=219470.1
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value

                              Df Sum of Sq       RSS    AIC
+ number_of_reviews            1   4416609 345740309 219183
+ reviews_per_month            1   2581771 347575146 219304
+ region_id                    1   2427897 347729021 219314
+ review_scores_cleanliness    1    945163 349211755 219411
+ host_acceptance_rate         1    817044 349339873 219419
+ review_scores_location       1    696833 349460084 219427
+ review_scores_communication  1    264200 349892718 219455
+ maximum_nights               1    246133 349910785 219456
+ review_scores_checkin        1    104527 350052390 219465
+ review_scores_accuracy       1     67549 350089368 219468
+ beds                         1     63066 350093851 219468
<none>                                     350156917 219470
+ host_response_rate           1     13477 350143440 219471
+ minimum_nights               1      4605 350152312 219472
- host_total_listings_count    1   8611691 358768608 220021
- accommodates                 1  10842576 360999494 220162
- review_scores_value          1  13953322 364110239 220358
- review_scores_rating         1  17573900 367730817 220583
- bedrooms                     1  20747303 370904220 220778

Step:  AIC=219183.1
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews

                              Df Sum of Sq       RSS    AIC
+ region_id                    1   2320256 343420053 219032
+ review_scores_cleanliness    1   1048631 344691678 219116
+ review_scores_location       1    953667 344786642 219122
+ host_acceptance_rate         1    354845 345385464 219162
+ maximum_nights               1    279197 345461111 219167
+ reviews_per_month            1    192518 345547791 219172
+ review_scores_communication  1    181919 345558390 219173
+ review_scores_accuracy       1    143571 345596738 219176
+ beds                         1     81524 345658785 219180
+ review_scores_checkin        1     60206 345680103 219181
<none>                                     345740309 219183
+ minimum_nights               1      5949 345734360 219185
+ host_response_rate           1      2506 345737803 219185
- number_of_reviews            1   4416609 350156917 219470
- host_total_listings_count    1   6913930 352654239 219632
- accommodates                 1  11190529 356930838 219906
- review_scores_value          1  12814851 358555159 220010
- review_scores_rating         1  17915815 363656124 220331
- bedrooms                     1  19206785 364947094 220412

Step:  AIC=219031.9
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id

                              Df Sum of Sq       RSS    AIC
+ review_scores_cleanliness    1   1194813 342225240 218955
+ review_scores_location       1    952976 342467076 218971
+ maximum_nights               1    298126 343121927 219014
+ host_acceptance_rate         1    246708 343173344 219018
+ review_scores_accuracy       1    167349 343252703 219023
+ review_scores_communication  1    163316 343256736 219023
+ reviews_per_month            1    157193 343262860 219023
+ beds                         1     86397 343333656 219028
+ review_scores_checkin        1     40613 343379439 219031
<none>                                     343420053 219032
+ host_response_rate           1     14774 343405279 219033
+ minimum_nights               1      9400 343410653 219033
- region_id                    1   2320256 345740309 219183
- number_of_reviews            1   4308968 347729021 219314
- host_total_listings_count    1   7091090 350511143 219495
- accommodates                 1  11339191 354759244 219769
- review_scores_value          1  12697434 356117487 219856
- review_scores_rating         1  17996722 361416774 220193
- bedrooms                     1  19478403 362898456 220286

Step:  AIC=218954.5
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness

                              Df Sum of Sq       RSS    AIC
+ review_scores_location       1    853226 341372014 218900
+ host_acceptance_rate         1    299407 341925832 218937
+ maximum_nights               1    288940 341936300 218937
+ review_scores_communication  1    189753 342035487 218944
+ reviews_per_month            1    168575 342056665 218945
+ review_scores_checkin        1     77171 342148069 218951
+ beds                         1     72895 342152345 218952
+ review_scores_accuracy       1     49331 342175908 218953
<none>                                     342225240 218955
+ minimum_nights               1      7224 342218016 218956
+ host_response_rate           1      3728 342221512 218956
- review_scores_cleanliness    1   1194813 343420053 219032
- region_id                    1   2466438 344691678 219116
- number_of_reviews            1   4415142 346640382 219244
- host_total_listings_count    1   7237217 349462457 219429
- review_scores_rating         1   8612987 350838227 219518
- accommodates                 1  11496151 353721391 219705
- review_scores_value          1  13435360 355660600 219829
- bedrooms                     1  19621467 361846707 220222

Step:  AIC=218899.7
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location

                              Df Sum of Sq       RSS    AIC
+ review_scores_communication  1    423436 340948578 218873
+ host_acceptance_rate         1    328519 341043495 218880
+ maximum_nights               1    277697 341094317 218883
+ review_scores_checkin        1    263293 341108721 218884
+ reviews_per_month            1    173866 341198148 218890
+ beds                         1     81067 341290947 218896
<none>                                     341372014 218900
+ minimum_nights               1      6823 341365191 218901
+ review_scores_accuracy       1      3640 341368374 218901
+ host_response_rate           1       807 341371207 218902
- review_scores_location       1    853226 342225240 218955
- review_scores_cleanliness    1   1095062 342467076 218971
- region_id                    1   2459398 343831412 219061
- number_of_reviews            1   4653024 346025038 219206
- host_total_listings_count    1   6882768 348254782 219352
- review_scores_rating         1   7327836 348699850 219381
- accommodates                 1  11125075 352497089 219628
- review_scores_value          1  14266198 355638212 219830
- bedrooms                     1  19853625 361225639 220185

Step:  AIC=218873.5
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication

                              Df Sum of Sq       RSS    AIC
+ host_acceptance_rate         1    330737 340617841 218853
+ maximum_nights               1    267338 340681240 218858
+ reviews_per_month            1    178939 340769639 218864
+ beds                         1     71143 340877435 218871
+ review_scores_checkin        1     62637 340885942 218871
+ review_scores_accuracy       1     40462 340908117 218873
<none>                                     340948578 218873
+ host_response_rate           1      9299 340939280 218875
+ minimum_nights               1      7003 340941575 218875
- review_scores_communication  1    423436 341372014 218900
- review_scores_location       1   1086909 342035487 218944
- review_scores_cleanliness    1   1119947 342068525 218946
- region_id                    1   2429785 343378363 219033
- number_of_reviews            1   4573139 345521717 219175
- host_total_listings_count    1   6576643 347525222 219306
- review_scores_rating         1   7640595 348589174 219376
- accommodates                 1  11199609 352148188 219607
- review_scores_value          1  13628215 354576794 219764
- bedrooms                     1  19799053 360747631 220156

Step:  AIC=218853.4
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate

                              Df Sum of Sq       RSS    AIC
+ maximum_nights               1    254604 340363237 218838
+ reviews_per_month            1     98662 340519179 218849
+ host_response_rate           1     93981 340523860 218849
+ beds                         1     82713 340535128 218850
+ review_scores_checkin        1     75089 340542752 218850
+ review_scores_accuracy       1     39440 340578401 218853
<none>                                     340617841 218853
+ minimum_nights               1      8841 340609000 218855
- host_acceptance_rate         1    330737 340948578 218873
- review_scores_communication  1    425654 341043495 218880
- review_scores_location       1   1119729 341737570 218926
- review_scores_cleanliness    1   1172278 341790119 218930
- region_id                    1   2307096 342924937 219005
- number_of_reviews            1   4126861 344744702 219126
- host_total_listings_count    1   6680171 347298012 219293
- review_scores_rating         1   7611105 348228946 219354
- accommodates                 1  11292480 351910321 219594
- review_scores_value          1  13647630 354265471 219746
- bedrooms                     1  19480495 360098336 220117

Step:  AIC=218838.4
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights

                              Df Sum of Sq       RSS    AIC
+ reviews_per_month            1    100777 340262460 218834
+ host_response_rate           1     89005 340274232 218834
+ beds                         1     85507 340277730 218835
+ review_scores_checkin        1     79604 340283632 218835
+ review_scores_accuracy       1     37767 340325470 218838
<none>                                     340363237 218838
+ minimum_nights               1      7623 340355613 218840
- maximum_nights               1    254604 340617841 218853
- host_acceptance_rate         1    318003 340681240 218858
- review_scores_communication  1    415469 340778705 218864
- review_scores_location       1   1103381 341466617 218910
- review_scores_cleanliness    1   1162985 341526221 218914
- region_id                    1   2326390 342689627 218991
- number_of_reviews            1   4161785 344525022 219113
- host_total_listings_count    1   6552441 346915678 219270
- review_scores_rating         1   7575251 347938488 219337
- accommodates                 1  11368950 351732187 219584
- review_scores_value          1  13706499 354069736 219735
- bedrooms                     1  19321733 359684970 220093

Step:  AIC=218833.6
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month

                              Df Sum of Sq       RSS    AIC
+ beds                         1     94499 340167961 218829
+ host_response_rate           1     93965 340168495 218829
+ review_scores_checkin        1     82380 340180080 218830
+ review_scores_accuracy       1     35313 340227148 218833
<none>                                     340262460 218834
+ minimum_nights               1      9061 340253399 218835
- reviews_per_month            1    100777 340363237 218838
- host_acceptance_rate         1    238921 340501381 218848
- maximum_nights               1    256719 340519179 218849
- review_scores_communication  1    419020 340681480 218860
- review_scores_location       1   1105454 341367914 218905
- review_scores_cleanliness    1   1165581 341428042 218909
- number_of_reviews            1   2164791 342427251 218976
- region_id                    1   2311576 342574036 218986
- host_total_listings_count    1   6294890 346557351 219249
- review_scores_rating         1   7614146 347876607 219335
- accommodates                 1  11372517 351634977 219580
- review_scores_value          1  13672942 353935402 219728
- bedrooms                     1  19288584 359551044 220087

Step:  AIC=218829.3
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds

                              Df Sum of Sq       RSS    AIC
+ host_response_rate           1     99926 340068035 218825
+ review_scores_checkin        1     79060 340088901 218826
+ review_scores_accuracy       1     34981 340132979 218829
<none>                                     340167961 218829
+ minimum_nights               1      9886 340158075 218831
- beds                         1     94499 340262460 218834
- reviews_per_month            1    109769 340277730 218835
- host_acceptance_rate         1    246578 340414539 218844
- maximum_nights               1    259767 340427728 218845
- review_scores_communication  1    407836 340575796 218855
- review_scores_location       1   1112155 341280116 218902
- review_scores_cleanliness    1   1150774 341318735 218904
- number_of_reviews            1   2148776 342316736 218971
- region_id                    1   2313466 342481427 218982
- host_total_listings_count    1   6088982 346256942 219231
- review_scores_rating         1   7593199 347761160 219330
- accommodates                 1   8428792 348596752 219384
- review_scores_value          1  13670917 353838878 219724
- bedrooms                     1  19314162 359482123 220084

Step:  AIC=218824.6
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate

                              Df Sum of Sq       RSS    AIC
+ review_scores_checkin        1     84739 339983296 218821
+ review_scores_accuracy       1     33808 340034227 218824
<none>                                     340068035 218825
+ minimum_nights               1      8395 340059640 218826
- host_response_rate           1     99926 340167961 218829
- beds                         1    100460 340168495 218829
- reviews_per_month            1    115398 340183433 218830
- maximum_nights               1    254597 340322631 218840
- host_acceptance_rate         1    327636 340395671 218845
- review_scores_communication  1    448493 340516527 218853
- review_scores_location       1   1110560 341178595 218897
- review_scores_cleanliness    1   1122046 341190081 218898
- number_of_reviews            1   2152378 342220413 218966
- region_id                    1   2334141 342402176 218978
- host_total_listings_count    1   6171242 346239277 219232
- review_scores_rating         1   7552210 347620244 219323
- accommodates                 1   8441632 348509667 219381
- review_scores_value          1  13631120 353699155 219717
- bedrooms                     1  19299916 359367951 220079

Step:  AIC=218820.9
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate + review_scores_checkin

                              Df Sum of Sq       RSS    AIC
+ review_scores_accuracy       1     46278 339937018 218820
<none>                                     339983296 218821
+ minimum_nights               1      7977 339975319 218822
- review_scores_checkin        1     84739 340068035 218825
- beds                         1     97092 340080388 218825
- host_response_rate           1    105604 340088901 218826
- reviews_per_month            1    118395 340101691 218827
- review_scores_communication  1    224441 340207737 218834
- maximum_nights               1    259093 340242389 218836
- host_acceptance_rate         1    341900 340325196 218842
- review_scores_cleanliness    1   1154155 341137451 218896
- review_scores_location       1   1181072 341164368 218898
- number_of_reviews            1   2134291 342117587 218961
- region_id                    1   2310342 342293638 218973
- host_total_listings_count    1   6156371 346139667 219227
- review_scores_rating         1   7619108 347602404 219323
- accommodates                 1   8427373 348410669 219376
- review_scores_value          1  13619039 353602335 219713
- bedrooms                     1  19340285 359323581 220078

Step:  AIC=218819.8
price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate + review_scores_checkin + 
    review_scores_accuracy

                              Df Sum of Sq       RSS    AIC
<none>                                     339937018 218820
- review_scores_accuracy       1     46278 339983296 218821
+ minimum_nights               1      7951 339929067 218821
- beds                         1     96408 340033426 218824
- review_scores_checkin        1     97209 340034227 218824
- host_response_rate           1    104641 340041658 218825
- reviews_per_month            1    115519 340052537 218826
- review_scores_communication  1    249813 340186831 218835
- maximum_nights               1    257573 340194591 218835
- host_acceptance_rate         1    342192 340279210 218841
- review_scores_cleanliness    1   1050825 340987843 218888
- review_scores_location       1   1113323 341050341 218892
- number_of_reviews            1   2156065 342093083 218962
- region_id                    1   2315221 342252239 218972
- host_total_listings_count    1   6121345 346058363 219224
- review_scores_rating         1   6448419 346385437 219246
- accommodates                 1   8416540 348353558 219375
- review_scores_value          1  13513474 353450492 219705
- bedrooms                     1  19341620 359278638 220077
summary(stepboth)

Call:
lm(formula = price ~ bedrooms + accommodates + host_total_listings_count + 
    review_scores_rating + review_scores_value + number_of_reviews + 
    region_id + review_scores_cleanliness + review_scores_location + 
    review_scores_communication + host_acceptance_rate + maximum_nights + 
    reviews_per_month + beds + host_response_rate + review_scores_checkin + 
    review_scores_accuracy, data = Rdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-1157.61   -72.83   -17.84    51.80   746.72 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)
(Intercept)                 -2.545e+02  1.934e+01 -13.161  < 2e-16
bedrooms                     5.049e+01  1.404e+00  35.974  < 2e-16
accommodates                 1.651e+01  6.955e-01  23.731  < 2e-16
host_total_listings_count    6.280e-01  3.103e-02  20.238  < 2e-16
review_scores_rating         4.430e+00  2.133e-01  20.772  < 2e-16
review_scores_value         -4.259e+01  1.416e+00 -30.070  < 2e-16
number_of_reviews           -1.986e-01  1.654e-02 -12.011  < 2e-16
region_id                    4.728e-03  3.799e-04  12.446  < 2e-16
review_scores_cleanliness    1.145e+01  1.366e+00   8.385  < 2e-16
review_scores_location       1.590e+01  1.842e+00   8.631  < 2e-16
review_scores_communication -7.742e+00  1.894e+00  -4.088 4.36e-05
host_acceptance_rate        -2.037e+01  4.258e+00  -4.785 1.72e-06
maximum_nights              -6.413e-03  1.545e-03  -4.151 3.32e-05
reviews_per_month           -1.923e+00  6.917e-01  -2.780  0.00544
beds                        -1.765e+00  6.948e-01  -2.540  0.01110
host_response_rate           1.515e+01  5.724e+00   2.646  0.00815
review_scores_checkin       -4.763e+00  1.868e+00  -2.550  0.01077
review_scores_accuracy       3.240e+00  1.841e+00   1.760  0.07848
                               
(Intercept)                 ***
bedrooms                    ***
accommodates                ***
host_total_listings_count   ***
review_scores_rating        ***
review_scores_value         ***
number_of_reviews           ***
region_id                   ***
review_scores_cleanliness   ***
review_scores_location      ***
review_scores_communication ***
host_acceptance_rate        ***
maximum_nights              ***
reviews_per_month           ** 
beds                        *  
host_response_rate          ** 
review_scores_checkin       *  
review_scores_accuracy      .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 122.3 on 22745 degrees of freedom
Multiple R-squared:  0.4963,    Adjusted R-squared:  0.4959 
F-statistic:  1318 on 17 and 22745 DF,  p-value: < 2.2e-16

Clustering

We will perform kmeans clustering with different cluster sizes, and determine the optimal cluster sizes. In this assignment, we have chosen variables price, beds and review_scores_rating to view kmeans clustering.

After conducting tests for optimal clusters, we conclude that cluster size of 4 is the most optimal. Users will be able to choose which cluster test they would like to perform to choose the best cluster size.

Users in the Shiny will be able to choose between different optimisation cluster methods such as Elbow Method, Sillhoutte Method and Gap Method.

airbnb.vic2 <- airbnb.vic %>%
  dplyr::select(id, price, bedrooms, beds, review_scores_rating)%>%
  na.omit()%>%
  column_to_rownames(var = "id")  

df <- airbnb.vic2
df <- scale(df)

k2 <-bigkmeans(df, centers=2, iter.max = 99, nstart = 1, dist = "euclid")
k3 <-bigkmeans(df, centers=3, iter.max = 99, nstart = 1, dist = "euclid")
k4 <- bigkmeans(df, centers=4, iter.max = 99, nstart = 1, dist = "euclid")
k5 <- bigkmeans(df, centers=5, iter.max = 99, nstart = 1, dist = "euclid")
k6 <- bigkmeans(df, centers=6, iter.max = 99, nstart = 1, dist = "euclid")
k7 <- bigkmeans(df, centers=7, iter.max = 99, nstart = 1, dist = "euclid")

# plots to compare
p1 <- fviz_cluster(k2, geom = "point", data = df) + ggtitle("k = 2")
p2 <- fviz_cluster(k3, geom = "point",  data = df) + ggtitle("k = 3")
p3 <- fviz_cluster(k4, geom = "point",  data = df) + ggtitle("k = 4")
p4 <- fviz_cluster(k5, geom = "point",  data = df) + ggtitle("k = 5")
p5 <- fviz_cluster(k6, geom = "point",  data = df) + ggtitle("k = 6")
p6 <- fviz_cluster(k7, geom = "point",  data = df) + ggtitle("k = 7")

grid.arrange(p1, p2, p3, p4, p5, p6, nrow = 2)

Determining Optimal Clusters - Elbow Method

# Determining Optimal Clusters - Elbow method

set.seed(123)

# function to compute total within-cluster sum of square 
wss <- function(k) {
  kmeans(df, k, nstart = 10 )$tot.withinss
}

# Compute and plot wss for k = 1 to k = 15
k.values1 <- 1:15

# extract wss for 2-15 clusters
wss_values <- map_dbl(k.values1, wss)

plot(k.values1, wss_values,
     type="b", pch = 19, frame = FALSE, 
     xlab="Number of clusters K",
     ylab="Total within-clusters sum of squares")

Determining Optimal Clusters - Silhouette Method

# Determining Optimal Clusters - Silhouette method

# function to compute average silhouette for k clusters
avg_sil <- function(k) {
  km.res <- kmeans(df, centers = k, nstart = 25)
  ss <- silhouette(km.res$cluster, dist(df))
  mean(ss[, 3])
}

# Compute and plot wss for k = 2 to k = 15
k.values2 <- 2:15

# extract avg silhouette for 2-15 clusters
avg_sil_values <- map_dbl(k.values2, avg_sil)

plot(k.values2, avg_sil_values,
     type = "b", pch = 19, frame = FALSE, 
     xlab = "Number of clusters K",
     ylab = "Average Silhouettes")

Determining Optimal Clusters - Gap Method

#Determining Optimal Clusters - Gap Method
# compute gap statistic
set.seed(123)
gap_stat <- clusGap(df, FUN = kmeans, nstart = 25,
                    K.max = 6, B = 50)
# Print the result
print(gap_stat, method = "firstmax")
Clustering Gap statistic ["clusGap"] from call:
clusGap(x = df, FUNcluster = kmeans, K.max = 6, B = 50, nstart = 25)
B=50 simulated reference sets, k = 1..6; spaceH0="scaledPCA"
 --> Number of clusters (method 'firstmax'): 2
         logW   E.logW      gap      SE.sim
[1,] 9.788689 11.55030 1.761613 0.001521763
[2,] 9.492014 11.33159 1.839577 0.001657763
[3,] 9.393094 11.22903 1.835932 0.001215878
[4,] 9.286598 11.13682 1.850224 0.001303178
[5,] 9.235990 11.08915 1.853160 0.001336053
[6,] 9.144105 11.04705 1.902943 0.001277187
fviz_gap_stat(gap_stat)

Parallel Plot

We will create a parallel plot to visualise the optimal cluster size and characteristics of each cluster.

# Add cluster to dataset
airbnb.vic2$cluster <- k4$cluster
airbnb.vic2$cluster <- as_factor(airbnb.vic2$cluster)

# Draw Parallelplot
parallelPlot(airbnb.vic2,refColumnDim = "cluster")