How to choose the right stat test for different metrics

Determining the data type and purpose of analysis

Data types

Quantitative data describe numerical values, which allow arithmetic operations. They are divided into two types: discrete (count values, such as the number of users of the application) and continuous (measured values, such as time spent in the application).

Qualitative (or categorical) data represent information that describes categories or groups. This data may be nominal (no natural order, e.g. device types) or ordinal (with a natural order, for example, level of satisfaction).

Goals of Analysis

Determining the purpose of the analysis requires a clear understanding of what needs to be learned from the data. The purpose can range from descriptive analysis, which aims to describe the characteristics of the data, to inferential analysis, which aims to test hypotheses and inferences about the population based on the sample.

Goal formulation includes:

Defining Research Questions: What do you want to know? What are the variables and how can they be related?
Hypothesis selection: Defining the null and alternative hypotheses you want to test.
Selecting an appropriate statistical method: Based on the type of data and research questions, you select the method that best suits the analysis.

The purpose of the analysis and the types of data directly influence the choice of statistical method. Let's move on to the types of statistical tests.

Types of stat tests

Parametric statistical tests

T-test assumes that the data are quantitative in nature and follow a normal distributionwhich means the data is distributed around the mean in a bell shape.

There are three main types of T-test: one-sample, independent (for two independent samples) and paired T-test.

One-sample T-test used when it is necessary to compare the mean of one group with the known mean of the population.

Independent (two-sample) T-test used to compare the means of two independent groups.

Paired T-test used to compare means in one group before and after an intervention or in two related groups.

The main assumptions on which the T-test is based include normality of data distribution, equality of variances for independent T-test and independence of observations.

The T-test involves calculating the t-statistic, which is then compared to the critical t-value from the Student distribution table. The T-statistic is calculated as the difference between group means divided by the standard error of the difference in means.

The main result of the T-test is the p-valuewhich shows the probability of obtaining the observed data, provided that the null hypothesis is true. If the p-value is less than the chosen significance level (usually 0.05), then the null hypothesis is rejected, indicating a statistically significant difference between the groups.

Analysis of Variance (ANOVA) is a statistical method used to compare the means of three or more groups. The method allows you to determine whether differences between groups cause statistically significant differences in the mean value of the dependent variable. ANOVA is based on quantitative data and assumes normal distribution and homogeneity of variances between groups.

To apply ANOVA, data must meet certain criteria:

Data type: must be quantitative, i.e. measurable and expressed numerically.
Normality of distribution: The data in each group should be approximately normally distributed. This assumption can be verified using various tests, for example the Shapiro-Wilk test.
Homogeneity of dispersions: Variances among groups should be equal. This condition can be tested using Levene's test.

ANOVA compares within-group and between-group variance to determine whether differences between groups are greater than what would be expected by chance. The main steps of the analysis include:

Formulation of hypotheses:
- Null hypothesis (H0): The means of all groups are equal.
- Alternative hypothesis (H1): At least one group has a mean that is different from the other groups.
Calculation of F-statistics: The F-statistic is calculated as the ratio of between-group variance to within-group variance. A high F value indicates greater differences between groups compared to differences within groups.
Determining p-value: The p-value is obtained from the F distribution and is used to determine the statistical significance of the results. If the p-value is less than the specified significance level, then the null hypothesis is rejected.
Interpretation of results: Rejection of the null hypothesis indicates that there are statistically significant differences between the group means.

After ANOVA indicates that there are statistically significant differences between group means, post hoc tests are conducted to determine which specific groups there are differences between. For example, the Tukey test or the Bonferroni test can be used to compare pairs of groups after conducting ANOVA.

Nonparametric statistical tests

Mann-Whitney test used in situations where it is necessary to compare the average ranks of two groups. This test is useful when the data does not follow a normal distribution or when the sample is too small to reliably test normality.

The test is suitable for analyzing both quantitative and ordinal data. The basic requirement is that the data can be ordered from smallest to largest value. This allows you to assign a rank to each value and use those ranks to compare two groups.

The main purpose of the Mann-Whitney test is to test the hypothesis that two independent samples are drawn from the same distribution, that is, there is no significant difference between the sample distributions. The test procedure includes the following steps:

Merging data from both groups and assigning ranks to all observations, starting with the smallest value.
Summation of ranks for each group separately.
Calculating the U statistic for each group using rank sums. The U statistic measures the number of pairs in which an element from one sample precedes an element from the other sample.
Determination of significance differences between groups based on the U statistic, using distribution tables of the Mann-Whitney U test or approximation by the normal distribution for large samples.

Kruskal-Wallis test allows you to compare the medians of three or more independent samples. This method is a nonparametric analogue of ANOVA and is used when the assumptions of normality of data distribution required by ANOVA are not met.

The Kruskal-Wallis test is used to analyze experimental data when one is interested in the effect of a categorical independent variable (with three or more levels) on a dependent variable measured on a quantitative or ordinal scale.

The criterion is suitable for analyzing both quantitative and ordinal data. Unlike parametric tests, Kruskal-Wallis does not require the assumption of normal distribution of data in each group, but rather assumes that all groups have the same distribution shape.

The main goal is to test the hypothesis that the medians of all groups are equal, that is, there are no statistically significant differences between the distributions of the dependent variable in different groups. If the test shows significant differences, this means that at least two groups have different medians.

The test process looks like this:

Ranking all data: First, all observations from different groups are combined and ranked from smallest to largest value. Ranks start at 1 for the lowest value.
Calculation of the sum of ranks for each group: For each group, the sum of the ranks of observations is calculated.
Calculating the Kruskal-Wallis (H) statistic: Using the sum of the ranks of each group, the value of the H statistic is calculated, which reflects the degree of difference between the groups.
Determining the significance of the H statistic: Compared with critical values from chi-square distribution tables or used to calculate a p-value to determine the statistical significance of observed differences.

If the resulting p-value is less than the selected significance level (for example, 0.05), then the null hypothesis of equality of medians is rejected, and it is concluded that there are significant differences between the groups.

So, for what purposes and types of data should we use tests?

For convenience, I prepared a table and included other methods that we did not consider above:

Test	Purpose of the test	Data types	Note
T-test	Comparison of the means of two groups	Quantitative, normal distribution	Used to compare two independent or related samples
ANOVA	Comparison of means of more than two groups	Quantitative, normal distribution, homogeneity of variances	Suitable for comparing three or more groups
Mann-Whitney test	Comparison of two independent samples	Quantitative or ordinal, without assumption of normal distribution	Non-parametric alternative to t-test
Kruskal-Wallis test	Comparison of several independent samples	Quantitative or ordinal, without assumption of normal distribution	Nonparametric Alternative to ANOVA
Chi-square test	Testing the independence of categorical variables	Quality (nominal)	Used for contingency tables
Correlation analysis	Assessing the relationship between two variables	Quantitative	Parametric and non-parametric methods

In conclusion, I would like to recommend free webinarwhich will introduce you to the basics of statistics: the normal distribution and the central limit theorem (CLT), key to data analysis and decision making in product analytics.

More information about the webinar.