choosing the best visualization

Hi all!

I am Daria Kasyanenko, an expert at the Center for Continuing Education of the Faculty of Computer Science at the National Research University Higher School of Economics. Recently, my colleague, Daria Ogneva, an analyst at Okko in the Bl-reporting group and teacher of the Data Analyst course, held a webinar for those who are just starting to dive into business analytics and want to understand data visualization.

Daria Ogneva

Analyst at Okko in the Bl-reporting group and teacher of the Data Analyst course

Two hundred twenty participants (88%) ranked sight as their most valuable sense. ©

How cool it would be if, in a split second, glancing across the chart, you already saw the answer to the question. Instead of sitting with a ruler and level in an attempt to get into the desired cell at the intersection of a column and a row or compare the height of neighboring columns.

Especially if this is a presentation – the screen is far away, and instead of a ruler there is a glass of coffee. An ideal, unattainable world with pink unicorns or is everything in our hands?


Schedule — a multi-parameter object that can be examined and optimized for hours. To simplify the experiment, let's focus on at least one parameter – visual encoding. It’s even simpler – we’ll limit ourselves to the five most popular: barchart, linear, scatterplot, paychart and table.

Level: without prior preparation.

Experiments are not a rake – it’s more productive to go through them yourself.

Inspired by the article “Task-Based Effectiveness of Basic Visualizations” Saket, Endert, Demiralp” and taking several non-random datasets with a random distribution by type (visual encoding) of graphs, during the webinar we tried to solve 3 problems: simplified ranking (select the 6th in descending metric XXX), the presence of anomalies and correlations.

Sample: active webinar listeners: https://cs.hse.ru/dpo/announcements/973735262.html

Tool: https://etc.ch/ *

*for multiple selection, percentages are calculated very strangely (the amount is normalized to 100%) -> after the fact, a separate calculation based on absolutes, which the tool allows you to download.

Simplified ranking: number 6 in descending order of cost

Test pictures:

Simplified ranking / Results:

graph type

accuracy (proportion of correct answers)

popularity

table

64%

top1

scatterplot

50%

top1

barchart

31%

top1

linear

29.2%

top1

paychart

18.5%

top4, minimum of non-zero

Conclusions: table – leader. For values ​​close in magnitude, the surrounding context (barchat vs scatterplot) significantly affects the accuracy of the answer. In the table, the influence of context is reduced. Pychart is an impressively low score.

What else is interesting to see:

  1. Conduct experiments with different distributions of values ​​in the dataset.

  2. View the dynamics of accuracy depending on the sample size.

Due to low accuracy and to reduce trauma to respondents, paychart was excluded from the race.

Presence of anomalies

Presence of anomalies / Test pictures

Presence of anomalies / Test pictures

graph type

accuracy (proportion of correct answers)

correct answer

scatterplot

95.5%

Yes

linear

86.4%

No

barchart

50.0%

No

table

22.7%

Yes

Conclusions: The table is not a leader. Anomalies or lack thereof are clearly shown on the line and scatter plots. Not all respondents fully understand the essence of the anomalies.

What else is interesting to see:

  1. Check for Bayes on the answer (yes) / maybe people in general have a tendency to see anomalies where there are none.

  2. Consider those who are fluent in the concept of anomaly and approximately understand the term.

  3. Increase the number of experiments to eliminate the influence of distribution specificity.

Presence of correlations

Presence of correlations / Test pictures

Presence of correlations / Test pictures

graph type

accuracy (proportion of correct answers)

correct answer

scatterplot

92%

Yes

linear

52%

Yes

table

16%

Yes

table

12%

Yes

Conclusions: The specificity of the barchat data significantly biased the results. Examples of the concept of correlations were given on scatterplots – the respondents answered the question quite accurately on them; for all other types there was not enough example of the presence/absence of correlation. I would like to repeat the experiment, changing the methodology.

What else is interesting to see: same as for anomalies

  1. Check for Bayes on the answer (yes) / maybe people in general have a tendency to see anomalies where there are none.

  2. Consider those who are fluent in the concept of correlation and roughly understand the term.

  3. Increase the number of experiments to eliminate the influence of distribution specificity.

Based on the results of the experiments, I want even more experiments. To check the results already obtained on large samples, to level out point artifacts. Moreover, we only examined response accuracy. It would be great referring to the originalconsider both the task completion time and the user’s subjective preference (how it is more convenient/usual for him to work). Moreover, it would be cool to consider different types of tasks, complementing the current three.


Results of the article “Task-Based Effectiveness of Basic Visualizations” Saket, Endert, Demiralp / statistically significant superiority of some types of graphs over others in terms of accuracy-speed-convenience metrics, broken down by type of task

However, even taking into account the specificity of the data and target audience, our small study confirmed the conclusions of the article and common sense:

Table good at tasks:

Barchart good at tasks:

  • Component-by-component comparison

  • min, max

  • Anomaly detection

  • Distribution

Line graph good at tasks

  • Dynamics

  • Correlation

Scatterplot good at tasks

Pychart good at tasks:


However, there are significantly more types of graphs (for example, https://datavizproject.com/), and choice Togo himself – a non-trivial task for both a beginner and an experienced user who is constantly slipping into bar charts.

Fortunately, there are flowcharts that help lost analysts get to the right chart: chart chousers. What kind of stick are youonly better on the level.

Chart chousers

** Please note that the latest project contains articles with the most popular dilemmas, as well as the advantages and nuances of using each type of chart. Moreover, inspiration. And practice English.

Summary

In the article, we looked at 5 popular types of visual encoding out of dozens that represent one of the attributes of graphs that are part of the magical world of data visualization, drifting in the crazy universe of BI analytics

Full webinar video

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *