So I want to see with wich movies I have wasted the most amount of time.

To analyse all the movies throughout all the libraries, we first load the existing .csv files with the data from Plex. We then combine them into one giant data frame.

all_movie_libraries <- list.files(path = "data", pattern = "*.csv", full.names = TRUE) %>%
                              lapply(read.csv) %>%
                              lapply(\(x) mutate(x, across(Audience.Rating, as.double))) %>%
                              bind_rows()

We then change the Duration from the format HH:MM:SS to an amount in minutes. Also we discard a lot of columns we don’t need anymore.

all_movie_libraries <- all_movie_libraries %>%
                                separate(Duration, c('play_time_hours', 'play_time_minutes', 'play_time_seconds'), ':') %>%
                                mutate(play_time_hours = as.numeric(play_time_hours), play_time_minutes = as.numeric(play_time_minutes), play_time_seconds = as.numeric(play_time_seconds)) %>%
                                mutate(play_time_single_in_minutes = ceiling(play_time_hours*60 + play_time_minutes + play_time_seconds/60)) %>%
                                select(title = Title,view_count = View.Count, play_time_single_in_minutes)
                                
head(all_movie_libraries)
##                                    title view_count play_time_single_in_minutes
## 1                            Animal Farm        N/A                          73
## 2           Batman & Mr. Freeze: SubZero        N/A                          68
## 3                           Batman Ninja        N/A                          86
## 4 Batman vs Teenage Mutant Ninja Turtles        N/A                          88
## 5                 The Batman vs. Dracula          1                          85
## 6                    Batman vs. Two-Face          1                          73

Now let’s filter out all movies that haven’t been watched before.

all_movie_libraries = all_movie_libraries %>%
                          filter(view_count != 'N/A')

With this we reduced the rows in our data frame from 869 to 287.

After that we multiply the amount of minutes with the amount of times the movie has been played. This gives us the total amount of time played.

all_movie_libraries <- all_movie_libraries %>%
                        mutate(view_count = as.numeric(view_count)) %>%
                        mutate(play_time_total_in_minutes = play_time_single_in_minutes * view_count) %>%
                        select(title, play_time_total_in_minutes)

head(all_movie_libraries)
##                          title play_time_total_in_minutes
## 1       The Batman vs. Dracula                         85
## 2          Batman vs. Two-Face                         73
## 3    Batman: Assault on Arkham                        154
## 4            Batman: Bad Blood                        148
## 5                 Batman: Hush                         83
## 6 Batman: Mask of the Phantasm                         77

And now sort the data frame in a descending order, pick the top 10 rows and make a bar plot.

all_movie_libraries <- all_movie_libraries %>%
                        arrange(desc(play_time_total_in_minutes)) %>%
                        head(10)

all_movie_libraries %>% 
  ggplot(aes(x=title, y=play_time_total_in_minutes, color=play_time_total_in_minutes)) +
    geom_bar(stat="identity", fill="white") +
    scale_x_discrete(guide = guide_axis(angle = 90)) +
    geom_text(aes(label=play_time_total_in_minutes), position = position_dodge(1.9), size=3.5, vjust=1.6) +
    labs(title = 'My most watched movies by time spent watching', y = 'Time watched in minutes', x = 'Movie Titles') +
    theme_classic()