choosing the best visualization
Hi all!
I am Daria Kasyanenko, an expert at the Center for Continuing Education of the Faculty of Computer Science at the National Research University Higher School of Economics. Recently, my colleague, Daria Ogneva, an analyst at Okko in the Bl-reporting group and teacher of the Data Analyst course, held a webinar for those who are just starting to dive into business analytics and want to understand data visualization.
Daria Ogneva
Analyst at Okko in the Bl-reporting group and teacher of the Data Analyst course
Two hundred twenty participants (88%) ranked sight as their most valuable sense. ©
How cool it would be if, in a split second, glancing across the chart, you already saw the answer to the question. Instead of sitting with a ruler and level in an attempt to get into the desired cell at the intersection of a column and a row or compare the height of neighboring columns.
Especially if this is a presentation – the screen is far away, and instead of a ruler there is a glass of coffee. An ideal, unattainable world with pink unicorns or is everything in our hands?
Schedule — a multi-parameter object that can be examined and optimized for hours. To simplify the experiment, let's focus on at least one parameter – visual encoding. It’s even simpler – we’ll limit ourselves to the five most popular: barchart, linear, scatterplot, paychart and table.
Level: without prior preparation.
Experiments are not a rake – it’s more productive to go through them yourself.
Inspired by the article “Task-Based Effectiveness of Basic Visualizations” Saket, Endert, Demiralp” and taking several non-random datasets with a random distribution by type (visual encoding) of graphs, during the webinar we tried to solve 3 problems: simplified ranking (select the 6th in descending metric XXX), the presence of anomalies and correlations.
Sample: active webinar listeners: https://cs.hse.ru/dpo/announcements/973735262.html
Tool: https://etc.ch/ *
*for multiple selection, percentages are calculated very strangely (the amount is normalized to 100%) -> after the fact, a separate calculation based on absolutes, which the tool allows you to download.
Simplified ranking: number 6 in descending order of cost
Test pictures:
Simplified ranking / Results:
graph type | accuracy (proportion of correct answers) | popularity |
table | 64% | top1 |
scatterplot | 50% | top1 |
barchart | 31% | top1 |
linear | 29.2% | top1 |
paychart | 18.5% | top4, minimum of non-zero |
Conclusions: table – leader. For values close in magnitude, the surrounding context (barchat vs scatterplot) significantly affects the accuracy of the answer. In the table, the influence of context is reduced. Pychart is an impressively low score.
What else is interesting to see:
Conduct experiments with different distributions of values in the dataset.
View the dynamics of accuracy depending on the sample size.
Due to low accuracy and to reduce trauma to respondents, paychart was excluded from the race.
Presence of anomalies
graph type | accuracy (proportion of correct answers) | correct answer |
scatterplot | 95.5% | Yes |
linear | 86.4% | No |
barchart | 50.0% | No |
table | 22.7% | Yes |
Conclusions: The table is not a leader. Anomalies or lack thereof are clearly shown on the line and scatter plots. Not all respondents fully understand the essence of the anomalies.
What else is interesting to see:
Check for Bayes on the answer (yes) / maybe people in general have a tendency to see anomalies where there are none.
Consider those who are fluent in the concept of anomaly and approximately understand the term.
Increase the number of experiments to eliminate the influence of distribution specificity.
Presence of correlations
graph type | accuracy (proportion of correct answers) | correct answer |
scatterplot | 92% | Yes |
linear | 52% | Yes |
table | 16% | Yes |
table | 12% | Yes |
Conclusions: The specificity of the barchat data significantly biased the results. Examples of the concept of correlations were given on scatterplots – the respondents answered the question quite accurately on them; for all other types there was not enough example of the presence/absence of correlation. I would like to repeat the experiment, changing the methodology.
What else is interesting to see: same as for anomalies
Check for Bayes on the answer (yes) / maybe people in general have a tendency to see anomalies where there are none.
Consider those who are fluent in the concept of correlation and roughly understand the term.
Increase the number of experiments to eliminate the influence of distribution specificity.
Based on the results of the experiments, I want even more experiments. To check the results already obtained on large samples, to level out point artifacts. Moreover, we only examined response accuracy. It would be great referring to the originalconsider both the task completion time and the user’s subjective preference (how it is more convenient/usual for him to work). Moreover, it would be cool to consider different types of tasks, complementing the current three.
Results of the article “Task-Based Effectiveness of Basic Visualizations” Saket, Endert, Demiralp / statistically significant superiority of some types of graphs over others in terms of accuracy-speed-convenience metrics, broken down by type of task
However, even taking into account the specificity of the data and target audience, our small study confirmed the conclusions of the article and common sense:
Table good at tasks:
Barchart good at tasks:
Component-by-component comparison
min, max
Anomaly detection
Distribution
Line graph good at tasks
Dynamics
Correlation
Scatterplot good at tasks
Pychart good at tasks:
However, there are significantly more types of graphs (for example, https://datavizproject.com/), and choice Togo himself – a non-trivial task for both a beginner and an experienced user who is constantly slipping into bar charts.
Fortunately, there are flowcharts that help lost analysts get to the right chart: chart chousers. What kind of stick are you – only better on the level.
Chart chousers
** Please note that the latest project contains articles with the most popular dilemmas, as well as the advantages and nuances of using each type of chart. Moreover, inspiration. And practice English.
Summary
In the article, we looked at 5 popular types of visual encoding out of dozens that represent one of the attributes of graphs that are part of the magical world of data visualization, drifting in the crazy universe of BI analytics