Refactoring Shiny Applications



Frame from the film “Formula of Love”, 1984

In the life cycle of any exploited software, there comes a phase when the accumulated set of changes (CR) becomes an unbearable burden on the primary architecture, and here comes the time for refactoring. Many books have been written on this topic, there are specifics for different languages. Below we will only touch on some aspects that may be useful in relation to RStudio Shiny applications. This is a series of practical methods, tricks and nuances that have accumulated when refactoring, as a rule, someone else’s Shiny code.

“Aliena nobis, nostra aliis” – If one person has built, the other will always be able to dismantle.

It was in the film, in the original source somewhat differently. Phrase of Publilius Syrah “Aliena nobis, nostra plus aliis placent” translates as “Alien to us, but ours is mostly liked by others.” But the blacksmith Stepan still speaks his mind.

Is a continuation a series of previous publications.

Making a beautiful and effective shiny application, like any application, is not easy enough. It would be nice to have the qualifications of an analyst-developer, but those can be found as often as test pilots. When you pick up the application, you usually see something from sticks and a binding mass. And it is necessary to make a hyperboloid tower out of this (“Shukhov’s” tower). The same sticks, but add bolts and build a light, reliable structure that is not afraid of hurricanes and is visible to everyone from afar.

Next, let’s go through the possible steps to put things in order.

Shiny application has a certain specificity, due to its focus. Analytics and interactive display of results.
Functionally, a typical Shiny application looks like this:

Logging

In principle, for any productive system, the presence of logs is Alpha and Omega. Especially in data-driven projects. The log is the only way to look at the situation exactly as it happened. For a failure case, it will be extremely difficult to reproduce everything from the sequence of data flow to the entire external environment with which the application explicitly or implicitly interacts. In the Shiny application, this is, in fact, the only interface for technological control.

We begin to cover with logs moving through the functional blocks from initialization to export. Our main task is to form a trace superimposed on the timeline in order to determine use cases and rank the brake points. The secondary task is to establish memory control. In all blocks associated with the appearance of new variables (especially reactive ones), it is necessary to weigh their size in RAM and the time of their receipt. Memory in applications flows with a bang, below I will give one of the typical cases.

Looking inside

In order to dive into the code of a running application, classic breakpoints do not always work. Therefore, in order not to guess and not suffer, we use a 100% working method – we set browser() at the point of interest. Simple and unpretentious. Stopped and got access to everything, including reactive expressions.

There is also a nice trick to embed a universal breakpoint, found here: “A little trick for debugging Shiny”. We put a button and hid it at startup – simple, elegant and available on the market without changing the code.

Establishing links between interface elements and server functions

Of course, a careful reading of the code will always help to trace all the relationships, but there are also very useful aids that will save a lot of time in a complex application.

Identifying elements on the screen

For quick orientation, it can be very useful to mark objects on an html page with their identifiers. The idea is simple and elegant, borrowed from here: “Display element ids for debugging Shiny apps”.

javascript:$("div[id]").each(function
$(this).attr("id")+"<br/></span>")}),
$("input[id]").each(function
+$(this).attr("id")+"<br/></span>")});

In fact, we bookmark a little javascript and get something like this.

Tracing the connection between visual elements and code

Shiny has a built-in mechanism for visualizing the activated code when performing certain actions in the interface (or updating by timer). Turned on with one command – shiny::runApp(display.mode="showcase"). You can read more details here: “Display modes”

Reconstructing reactive links between elements

If you have an application with one output and one input, just a close look is enough. In a complex, twisted application, there may be many connections, and they can reach any depth of hierarchy – it depends on the author’s invention. To engage in research and elimination of redundant links allows the mechanism reactlog. It turns on simply. We put the package of the same name, add it to the code options(shiny.reactlog=TRUE) and press when the application is running Ctrl+F3. The whole picture of the engine compartment interaction is at a glance. You can read in detail here: https://rstudio.github.io/reactlog/.

We destroy unnecessary reactivity

A typical situation when an analyst spent a long time figuring out what reactive computing is and what they eat with. And now, understanding has come! And let’s apply this reactivity directly to every sneeze.

The app is turning into a complete mess. Looking at the code, you have no idea what is being called where and from what corner the hare will jump out. Reactlog helps to reconstruct the dynamics of real connections, but do not forget that there is more lazy evaluation. A reactive expression will not be evaluated until it is actually needed. Therefore, if you do not click absolutely all the nooks and crannies, there is a chance that something is missing.

And there is a second factor that turns the application into a monster that devours all resources. The fact is that reactive expressions store a cache. Well, if this is a single value obtained as a result of a long calculation. But developers are starting to string dataframes on a barbecue, which can be measured in gigabytes. Chik-chik and RAM is over.

Here is an example of a small application that demonstrates such tricks. At the start, they gobbled up all the memory, and have not even begun to count anything yet. Moreover, the memory leak is visible in the OS, and R happily reports about 300 MB. Reactivity has an effect.

To study the structure of objects in this case, it is best to use the package lobstryou can dig down to the pointers and calculate the joint memory consumption of several objects.

If possible, one should try to achieve complete transparency of the calculations from the source to the receiver.

app.R

library(shiny)

# Define UI for application that draws a histogram
ui <- fluidPage(

  # A BROWSER ANYWHERE, ANYTIME
  # Add to your UI: 
  actionButton("browser", "browser"),
  tags$script("$('#browser').hide();"),
  tags$script("$('#browser').show();"),
  # And to show the button in your app, go 
  # to your web browser, open the JS console, 
  # And type: $('#browser').show();

  # Application title
  titlePanel("Old Faithful Geyser Data"),

  # Sidebar with a slider input for number of bins 
  sidebarLayout(
    sidebarPanel(
      sliderInput("bins",
                  "Number of bins:",
                  min = 1,
                  max = 50,
                  value = 30),
      br(),
      actionButton("add50", "+ 500Mb")
    ),

    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("distPlot"),
      textOutput("info"),
      tags$style(type="text/css", "#info {white-space: pre-wrap;}")
    )
  )
)

df <- data.frame(
  a = stringi::stri_rand_strings(10000, 10, '[a-z]'),
  b = stringi::stri_rand_strings(10000, 12, '[A-Z]')
)

# Define server logic required to draw a histogram
server <- function(input, output) {

  output$distPlot <- renderPlot({
    # generate bins based on input$bins from ui.R
    x    <- faithful[, 2]
    bins <- seq(min(x), max(x), length.out = input$bins + 1)

    # draw the histogram with the specified number of bins
    hist(x, breaks = bins, col="darkgray", border="white")
  })

  output$info <- renderText({
    glue::glue("Counter = {rval$cnt}",
               "mem_used = {fs::fs_bytes(lobstr::mem_used())}",
               "react3_df = {fs::fs_bytes(lobstr::obj_size(react3_df()))}",
               "react4_df = {fs::fs_bytes(lobstr::obj_size(react4_df()))}",
               .sep = ", ")
  })

  # A BROWSER ANYWHERE, ANYTIME
  # Add to your server 
  observeEvent(input$browser,{
    browser()
  })

  react1_df <- reactive({
      dplyr::mutate(df, c = input$bins)
  })

  react2_df <- reactive({
      dplyr::mutate(react1_df(), d = input$bins * 2)
  })

  react3_df <- reactive({
    # runif(6.5555e8 * rval$cnt)
    runif(6.5555e6 * rval$cnt)
  })

  react4_df <- reactive({
    # заберем все, кроме последнего элемента. Как бы "фильтрация"
    react3_df()[-1] 
  })

  rval <- reactiveValues(cnt = 1) # Defining & initializing the reactiveValues object

  observeEvent(input$add50, {
    rval$cnt <- rval$cnt + 1
  })

}

# Run the application 
shinyApp(ui = ui, server = server)

Rolling up in a function

In a good way, significant pieces of code should be rolled up in functions and taken out. They must be encapsulated, excluding all references to variables outside the function body. Dataframes, even gigabyte ones, are easy to pass inside in parameters, since the list of links to columns leaves, and it is cheap.
Functions are good for debugging, documenting, profiling, alienating. For them, you can make snapshots of parameters for autotests.

Decide on the scope of variables

Based on the application logic, the number of users and the deployment scheme, we determine what and where we store, where and when we load it. Global variables, session variables, cross-cache in the form of in-memory db. A good tip is written here: “Scoping rules for Shiny apps”.

Decide on data sources and loading methods

The classic Shiny application is designed for calculations with preloaded data. After we understand what is being used, how much is loaded and what filters are applied in the application, we can change the strategies for working with sources. It is important to take into account the frequency of updating information in external sources. In most cases, the work goes on closed periods.

We are looking for a compromise, as a target – the minimum application response time to user actions. Ease of use is at the forefront. Data that takes a long time to load, or an external source is unreliable, is best cached locally. In its database, in files, and how – a matter of taste. A great contender for the silver bullet is Apache Arrow. You can quickly filter at the lower level and not drag garbage into memory, you can conduct primary aggregation at the bottom, you can not load the R String Pool with garbage, remaining at the Arrow Dataframe level. Wonderful!

One of the possible options is to move all draft tasks for background loading and preprocessing to a separate “ETL” layer. R scripts and cron. The application must work with already fully prepared and optimized data. No need to steal time from users to solve their private technical problems.

Simplifying filtering

Most Shiny apps will have filters of some sort. And it’s good if they are only in terms of values. But no. Very often, the value “All” appears, which can be an empty field in the filter. If there are several such filters (3-5 or more) and they are applied on a large number of pages/bookmarks, then this may be a problem.

The desire for tracing to sources and the reduction of reactive objects conflicts with the need for flexible filtering. This can be further aggravated by the fact that there is one reference book in the filter on the screen, and other values ​​correspond to it in the data. For example, the expression dt[group == input$grp] if grp == NULLbut everything is implied, it will not give at all what we would like.

There is a good trick for this case. The idea is extremely simple – first we compose a row index (boolean vector), which is the intersection of various filters in different columns, and then just in one fell swoop in data.table select rows in I and immediately apply the function in J. The contradiction is allowed. Simply and easily.

The code might look something like this.

# функция для фильтрации колонок в реактивном data.table
smartFilter <-  function(dt, val, col_name){

  if(is.null(val)) val <- "Все"
  if(val != "Все") dt[val, on = col_name, which = TRUE] else dt[, .I]
}

# функция для расчета пересечений всех фильтров data.table 
intersectFilters2 <-  function(lst){
  Reduce(intersect, Filter(Negate(is.null), lst)) 
}

# код фильтрации ниже
idx <- dt %>%
  {list(
    .[!is.na(`Тип`), which = TRUE],
    .[`Год` == as.integer(input$year_input), which = TRUE],
    .[`Неделя` == input$week, which = TRUE],
    smartFilter(., input$domain, "Домен"),
    smartFilter(., input$service, "Сервис"),
    smartFilter(., input$business_operation, "Бизнес-операция")
  )} %>% 
  intersectFilters2()

used_cols <- c(
  'Год', 'Неделя', 'Домен', 'Сервис', 
  'Бизнес-операция', 'Номер', 'Тип', 'Статус')

dt[idx, .SD, .SDcols = used_cols] %>%
  .[, ':='(a = sum(b), c = mean(d))]

Additionally we accelerate

To reduce the application response time, additional technological steps will have to be taken. Adding parallelization to loading and calculation processes.
Caching of calculation results, incl. graphic.

As a starting point, you can start reading from here:

A very good help is reading the code from the masters. An excellent set of minimal examples prepared by RStudio. Read and watch apps. “Collection of Shiny examples”. And excellent conference materials «Shiny Developer Conference 2016 talks». Very often the basics need to be learned from starting documents and archives. Later, there is no time left for this, everyone runs forward and the foundations are hidden under a pile of “obviously”.

A summary of the previously given documents + books.

Previous post – “Say a word about the poor bit.”

Similar Posts

Leave a Reply