How I solved the problem of finding the Student t-test as a function of the confidence interval and the number of degrees of freedom
Remark:
This article presents the author's method of finding t-критерия
, developed in the process of solving a larger problem. This method does not claim to be effective or universal in various conditions of use, but represents a purely personal approach to solving a specific problem. The author shares his experience and methodology without asserting their generally accepted suitability or optimality in other contexts.
t-test:
t-test is a key tool in statistical analysis used to test hypotheses about the means of two samples or to evaluate the significance of differences between groups of data. This test is based on the Student distribution and allows you to determine how significant the differences between sample means are, given their standard errors. The t-test value is calculated as the ratio of the difference between the sample means to their standard error of the difference.
(t-distribution) is used in statistics to estimate confidence intervals and test hypotheses about means when the sample size is small or the population parameters are unknown. The probability density function of the t-distribution is given by the following formula:
Where t
– t-test, n
– number of degrees of freedom
The main factors on which the t-test depends:
Significance level
α
: The significance level determines the probability of a type I error (rejecting a true null hypothesis).Number of degrees of freedom
n
: This is a parameter that depends on the sample size and determines the shape of the Student distribution.
that is, it’s your turn t
– this can be considered (not in the strict sense) as a function of α
And n
:
In this article, I set my task to express this dependence programmatically in the form of the ta function.
//"интерфейс" этой функции
def ta(a: Double, n: Int): Double = ???
// где a и n доверительный интервал и число степеней свободы соответственно
to find the t-test I needed two algorithms ACM295 And ACM209
ACM209:
Algorithm ACM209 is a numerical approximation for calculating the Gaussian distribution function, also known as the normal distribution. This algorithm was proposed to quickly and accurately calculate the integral of the probability density function of a normal distribution. this algorithm comes up in this context since the t-distribution and Gaussian distributions are “connected” and in the limiting case when //ACM 209 реализация на scala def ACM209(z: Double): Double = { var y: Double = 0.0 var p: Double = 0.0 var w: Double = 0.0 if (z == 0.0) p = 0.0 else { y = Math.abs(z) / 2 if (y >= 3.0) { p = 1.0 } else if (y < 1.0) { w = y * y p = ((((((((0.000124818987 * w - 0.001075204047) * w + 0.005198775019) * w - 0.019198292004) * w + 0.059054035642) * w - 0.151968751364) * w + 0.319152932694) * w - 0.5319230073) * w + 0.797884560593) * y * 2.0 } else { y = y - 2.0 p = (((((((((((((-0.000045255659 * y + 0.00015252929) * y - 0.000019538132) * y - 0.000676904986) * y + 0.001390604284) * y - 0.00079462082) * y - 0.002034254874) * y + 0.006549791214) * y - 0.010557625006) * y + 0.011630447319) * y - 0.009279453341) * y + 0.005353579108) * y - 0.002141268741) * y + 0.000535310849) * y + 0.999936657524 } } if (z > 0.0) (p + 1.0) / 2 else (1.0 - p) / 2 }
The Scala implementation of the algorithm presented above involves several key steps.
First, the absolute value of the input parameter is calculated z
, and depending on its value, the algorithm selects one of several branches of calculations. If y
(half absolute value z
) greater than or equal to 3, probability р
is set equal to 1. In the case y
less than 1, polynomial approximation is used to calculate p
using a number of coefficients. For values y
between 1 and 3, a different polynomial approximation is used, involving a different set of coefficients. Finally, depending on the sign z
the resulting value p
adjusted to return a result that is the value of the Gaussian distribution function for a given z
.
ACM395:
Algorithm ACM395 implements a numerical approximation for calculating the values of the t-statistic distribution function. This function returns a parameter p
: level of significance or probability of a type I error. is a function of the number of degrees of freedom n
And t-критерия
:
//ACM 395 моя реализация на scala
def ACM395(t: Double, n: Int): Double = {
var a = 0.0
var b = 0.0
var y = 0.0
val tSq = t * t
y = tSq / n
b = y + 1.0
if (y > 1.0E-6) y = Math.log(b)
a = n - 0.5
b = 48.0 * a * a
y = a * y
y = (((((-0.4 * y - 3.3) * y - 24.0) * y - 85.5) /
(0.8 * y * y + 100.0 + b) + y + 3.0) / b + 1.0) *
Math.sqrt(y)
2.0 * ACM209(-y)
}
After calculation p
the ACM395 function returns which is the calculation of the value of the t-statistic distribution function using the ACM209 function to calculate the probability.
Confidence calculation
α
represents
// в общем тут нечего коментрировать
def alpfa(ta: Double, n: Int): Double = {
1 - ACM395(ta, n)
}
We calculate the t-test:
α
takes values from 0
before 1
and at n = const
Bolzeno-Cauchy theorem
and find
but for this you need to know the interval in which to be t-критерий
For α0
and I do this by searching through intervals of length 1 until I find the required one
//метод для нахождения интервала
def findInterval(fun: Double => Double): (Double, Double) = {
@tailrec
def finder(fun: Double => Double, min: Double, max: Double): (Double, Double) = {
fun(min)*fun(max) match {
case value if value < 0 => (min, max)
case value if value > 0 => finder(fun, min + 1, max + 1)
}
}
finder(fun, 0, 1)
}
val interval = findInterval(conAlpha)
when the interval is found, you can apply the half division method
//метод реализующий половинное деление
def bisection(tol: Double = 1e-10): (Double => Double, (Double, Double)) => Double = {
(f: Double => Double, interval: (Double, Double)) => {
@tailrec
def loop(interval: (Double, Double)): Double = {
val (a, b) = interval
val midpoint = (a + b) / 2
val fMid = f(midpoint)
if (fMid == 0 || (b - a) / 2 < tol) midpoint
else if (f(a) * fMid < 0) loop(a, midpoint)
else loop(midpoint, b)
}
loop(interval)
}
}
bisection()(conAlpha, interval)
and in total the method code looks like this
def ta(a0: Double, n: Int): Double = {
def alphaParallel(fun:(Double, Int) => Double, delta: Double): (Double, Int) => Double = {
(t: Double, n: Int) => {
fun(t, n) - delta
}
}
def alpfaConst(fun: (Double, Int) => Double, n: Int): Double => Double = {
t: Double => fun(t, n)
}
def findInterval(fun: Double => Double): (Double, Double) = {
@tailrec
def finder(fun: Double => Double, min: Double, max: Double): (Double, Double) = {
fun(min)*fun(max) match {
case value if value < 0 => (min, max)
case value if value > 0 => finder(fun, min + 1, max + 1)
}
}
finder(fun, 0, 1)
}
def bisection(tol: Double = 1e-10): (Double => Double, (Double, Double)) => Double = {
(f: Double => Double, interval: (Double, Double)) => {
@tailrec
def loop(interval: (Double, Double)): Double = {
val (a, b) = interval
val midpoint = (a + b) / 2
val fMid = f(midpoint)
if (fMid == 0 || (b - a) / 2 < tol) midpoint
else if (f(a) * fMid < 0) loop(a, midpoint)
else loop(midpoint, b)
}
loop(interval)
}
val parAlpha = alphaParallel(alpfa, a0)
val conAlpha = alpfaConst(parAlpha, n)
val interval = findInterval(conAlpha)
bisection()(conAlpha, interval)
}
Result:
Let's compare the table values and those generated by my method
| 0.95 | 0.99 | 0.999 |
1 | 12.70 | 63.65 | 636.61 |
2 | 4,303 | 9.925 | 31,602 |
3 | 3.182 | 5,841 | 12,923 |
4 | 2,776 | 4,604 | 8,610 |
5 | 2,571 | 4,032 | 6,869 |
6 | 2,447 | 3,707 | 5,959 |
7 | 2.365 | 3,499 | 5,408 |
8 | 2,306 | 3.355 | 5,041 |
9 | 2,262 | 3,250 | 4,781 |
10 | 2.228 | 3.169 | 4,587 |
and this is what my method produced:
note that the larger n and the smaller α0
the smaller the difference with the table value.
PS you can look at the code here