• Home
  • Data Analytics Projects
    • Data Analytics Using Python
      • Mini Project: AirBnB Recommender App
      • Data Visualization
      • Text Mining (Text Nomalisation)
      • Statistics Analysis
      • A Study of Food Consumption in the World
    • DataCamp Projects
    • Tableau Visualizations
    • Experimenting with Google Teachable Machines
    • Data Analytics Using R
      • Udemy Projects
      • Using Dplyr
      • Hypothesis Testing
      • Linear Regression (Project1)
      • Linear Regression (Project 2)
      • Linear Regressions Exploring Interactions
      • Regression Models
      • Multiple Regression (Project 2)
  • Philanthropy
    • 2018 Water for Life
    • 2019 Habitat for Humanity Global Build
  • My Thoughts
  • Contact
BarbaraYam.com
  • Data Analytics Projects
    • Data Analytics Using Python
      • Mini Project: AirBnB Recommender App
      • Data Visualization
      • Text Mining (Text Nomalisation)
      • Statistics Analysis
      • A Study of Food Consumption in the World
    • DataCamp Projects
    • Tableau Visualizations
    • Experimenting with Google Teachable Machines
    • Data Analytics Using R
      • Udemy Projects
      • Using Dplyr
      • Hypothesis Testing
      • Linear Regression (Project1)
      • Linear Regression (Project 2)
      • Linear Regressions Exploring Interactions
      • Regression Models
      • Multiple Regression (Project 2)
  • Philanthropy
    • 2018 Water for Life
    • 2019 Habitat for Humanity Global Build
  • My Thoughts
  • Contact

Exploring Tidyverse and Dplyr

It looks as though I have had a bit of hiatus from R programming. Well, after the A to Z course by Kirill from Udemy, I went to study a number of Probability, Statistics and also preparing for the Data Science interview courses. Also spent most of May working on creative visualisations and with the circuit breaker coming to an end, it has been a very good 3 months for me working from home. 

 

We will continue to work from home for the time being. But this circuit breaker season has brought me back to long needed reset of my life. I got to declutter my shelves and cupboards, and I also got to look at all the courses I have placed on the backburner and actually completing them. 

 

I first got exposed to Dplyr and Tidyverse packages from R in another course, Wiley Certified Data Analyst. Finally got to practice my head knowledge here. 

Exploring Dplyr

Exploring Dplyr

Barbara Yam

17 Jun 2020

Task 1 to 3

f1= read.csv('f1.csv',check.names = FALSE)
pacman::p_load(dplyr,tidyverse)
f1New = f1 %>%
  gather('year','count',2:ncol(f1),convert=TRUE) %>%
  rename('categories'= Variables) %>%
  arrange(categories)

head(f1New,10)
##                                            categories year    count
## 1                     Available Room-Nights (Number)  2010 11262019
## 2                     Available Room-Nights (Number)  2011 12377895
## 3                     Available Room-Nights (Number)  2012 12450851
## 4                     Available Room-Nights (Number)  2013 13118384
## 5                     Available Room-Nights (Number)  2014 14241499
## 6                     Available Room-Nights (Number)  2015 15130568
## 7                     Available Room-Nights (Number)  2016 16161862
## 8   Hotel Food & Beverage Revenue (Thousand Dollars)  2010  1052016
## 9   Hotel Food & Beverage Revenue (Thousand Dollars)  2011  1315098
## 10  Hotel Food & Beverage Revenue (Thousand Dollars)  2012  1309864
f1New %>% 
subset(year==2011)
##                                            categories year      count
## 2                     Available Room-Nights (Number)  2011 12377895.0
## 9   Hotel Food & Beverage Revenue (Thousand Dollars)  2011  1315097.6
## 16             Hotel Room Revenue (Thousand Dollars)  2011  2643538.8
## 23  Number Of Gazetted Hotels (At End Year) (Number)  2011       98.0
## 30        Standard Average Occupancy Rate (Per Cent)  2011       86.4
## 37               Standard Average Room Rate (Dollar)  2011      247.1
library(ggplot2)
f1New %>% 
  ggplot(aes(x=year,y=count,color=categories))+
  geom_line()

f1New %>%
  ggplot(aes(x=year,y=count)) + geom_line() + facet_wrap(~categories)

df <- data.frame(x=c(NA,"a.b","a.d","b.c"))
df %>% separate(x,c("A","B"))
##      A    B
## 1 <NA> <NA>
## 2    a    b
## 3    a    d
## 4    b    c
df <- data.frame(x=c("x:123","y:error:7"))
df %>% separate(x,c("key","value"))
## Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [2].
##   key value
## 1   x   123
## 2   y error
df <- data.frame(x=c("x:123","y:error:7"))
df %>% separate(x,c("key","value"),":",extra="merge")
##   key   value
## 1   x     123
## 2   y error:7
df <- data.frame(x=c("a","a b","a b c", NA))
df %>% separate(x,c("a","b"),extra="drop",fill="right")
##      a    b
## 1    a <NA>
## 2    a    b
## 3    a    b
## 4 <NA> <NA>
df %>% separate(x,c("a","b"),extra="merge",fill="left")
##      a    b
## 1 <NA>    a
## 2    a    b
## 3    a  b c
## 4 <NA> <NA>

Task 5 Part 1

pacman::p_load(dplyr,tidyverse)

f1New %>% 
  group_by(categories) %>%
  summarise(mean = round(mean(count),2),
            median=round(median(count,2)),
            standardDeviation= round(sd(count),2),
            no.of.rows=n())
## # A tibble: 6 x 5
##   categories                            mean  median standardDeviati~ no.of.rows
##   <chr>                                <dbl>   <dbl>            <dbl>      <int>
## 1 " Available Room-Nights (Number)~   1.35e7  1.31e7       1722385.            7
## 2 " Hotel Food & Beverage Revenue ~   1.34e6  1.34e6        143975.            7
## 3 " Hotel Room Revenue (Thousand D~   2.86e6  2.92e6        397677.            7
## 4 " Number Of Gazetted Hotels (At ~   1.21e2  1.13e2            24.7           7
## 5 " Standard Average Occupancy Rat~   8.56e1  8.60e1             0.83          7
## 6 " Standard Average Room Rate (Do~   2.46e2  2.47e2            15.4           7

Task 5 Part 2

messyData = data.frame(
  name = c('Tom','Bob','Merv'),
  "2010" = c("married","single","single"),
  "2012" = c("father","married","single")
)
library(dplyr)
library(tidyverse)

messyData %>% 
 gather(key='Year', value="Event",-name) %>%
 separate(col= 'Year',into= c('X','Year'),sep="X") %>%
 select(-X)
##   name Year   Event
## 1  Tom 2010 married
## 2  Bob 2010  single
## 3 Merv 2010  single
## 4  Tom 2012  father
## 5  Bob 2012 married
## 6 Merv 2012  single

Task 6

f1New %>%
  group_by(categories) %>%
  filter(str_detect(categories,'Revenue')) %>%
  ggplot(aes(x=year,y=count))+
           facet_wrap(~categories) +
  geom_line()

Task 7

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
tidy_iris = iris %>%
  gather('Type','Value',1:4) %>%
  separate(col='Type',into=c("Part","Measure")) 

head(tidy_iris)
##   Species  Part Measure Value
## 1  setosa Sepal  Length   5.1
## 2  setosa Sepal  Length   4.9
## 3  setosa Sepal  Length   4.7
## 4  setosa Sepal  Length   4.6
## 5  setosa Sepal  Length   5.0
## 6  setosa Sepal  Length   5.4
ggplot(tidy_iris,aes(x=Measure,y=Value, color=Part))+
         geom_jitter()+facet_grid(.~Species)

Disclaimer: 

This is a personal website. The opinions expressed here represent my own and not those of my employer. 

In addition, my thoughts and opinions change from time to time I consider this a necessary consequence of having an open mind.

All rights reserved 2024 

Privacy Policy applies 

Terms and Conditions apply.