find every last Monday of the month in the dataset

From time to time, problems arise in R that are simple in nature, but not obvious to those who are just starting their journey.

Imagine that in our organization, every last Monday of the month, goods are accounted for. These days there are no sales. And we would like to take this into account in our forecasts. There is a question: how to “catch” these Mondays in the data without using function.

Let’s see how it can be done.

First we need to import the libraries tidyverse And lubridate.

Tidyverse is a set of packages for the R programming language developed by the Hadley Wickham team and colleagues. It contains several packages that make it easy to read, process, visualize and model data using a single framework and programming style. Key packages in Tidyverse are ggplot2 for data visualization, dplyr for data manipulation, tidyr to work with data in “long” format, readr to read and write data in CSV and other formats, and purrr for functional programming. Tidyverse strives to make R programs more understandable, concise and easy to read through a unified approach to data processing and visualization.

Lubridate is a package for the R programming language that makes it easy to work with dates and times. It provides a user-friendly interface for various tasks such as extracting dates, times and time intervals from strings, formatting dates and times, arithmetic operations on dates and times, converting between different date and time formats, and much more.

# импортируем библиотеки tidyverse и lubridate
library(tidyverse)
library(lubridate)

# создадим набор данных с 2023-01-01 по 2024-12-31
df <- seq(ymd("2023-01-01"), ymd("2024-12-31"), by = "day")
df <- as_tibble(df)

Let’s create additional columns: w_day (day of the week), m_th (month) and y_r (year):

df_wmy <- df %>% 
  mutate(
    w_day = wday(value, week_start = 1),
    m_th = month(value),
    y_r = year(value)
  )

Argument week_start in function wday() is used to specify the day of the week that will be considered the start of the week when calculating the day of the week for a given date. Default, week_start is 7, which means the week starts on Sunday and ends on Saturday.

If set week_start = 1, then the week will start on Monday and end on Sunday. In other words, if you set week_start = 1 and call the function wday() for a date that falls on a Monday, the function will return the value 1, and for a date that falls on a Sunday, the function will return the value 7.

For example, call wday("2023-02-13", week_start = 1) will return a value of 1, since February 13, 2023 is a Monday. A call wday("2023-02-19", week_start = 1) will return a value of 7, since February 19, 2023 is a Sunday.

Functions month And yearallow you to retrieve information in integer format.

Now, having additional columns, we can select the Mondays we need:

df_monday <- df_w_m %>%
  filter(w_day == 1) %>% 
  group_by(m_th, y_r) %>% 
  filter(row_number() == n()) %>%
  ungroup()

Here we have applied a filter to work only within the day of the week that we are interested in. Then we group the data by month and year so that by applying the expression filter(row_number() == n()), find the last value in the grouped dataframe.

Design row_number() == n() is a boolean expression used to filter data in R using the library dplyr.

Function row_number() creates a new column with row numbers in the dataset, and n() is an argument function inside filter()which returns the number of the line on which the current filtering iteration is located.

Thus, row_number() == n() compares the line numbers with the current iteration number returned n()and returns TRUE if the row number matches the number of the current filtering iteration.

The use of this expression in filter() allows you to select only one row from the dataset that corresponds to the current filtering iteration.

In our case, we select every last value in each group. If we wanted, for example, to find each second Monday of the month – we would change our expression to

df_w_m %>%
  filter(w_day == 1) %>% 
  group_by(m_th, y_r) %>% 
  filter(row_number() == 2) %>%
  ungroup()

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *