For the computation of log(cdf(x)) , may be equation (9) in the following PDF document: Some Inferential Problems from Log Student’s T-distribution and its Multivariate Extension, can be helpful, as a reference.
Equation (9) is about the "Log Student’s T-distribution", so it is not applicable to computing the logarithm of the cdf for a standard T-distribution.
But I think, I have found the piece of the puzzle, I was missing.
The formula for cdf(x) in Wikipedia and elsewhere are most often stated for x>0.
The expectation is that one can use P(X≤x) = P(X≥-x) = 1 - P(X≤-x)
for negative x. However, since we are after log(P(X≤x))
we get stuck with log(1 - P(X≤-x))
.
Other sources such as "Sampling Student’s T distribution – use of the inverse cumulative distribution function" by William T. Shaw features this formula:
which shows how to get a formula for x<0.
Asking wolframscript
about the CDF for a StudentTDistribution, we get the same result as Shaw:
In[339]:= CDF[StudentTDistribution[v], x]
v v 1
BetaRegularized[------, -, -]
2 2 2
v + x
Out[339]= Piecewise[{{-----------------------------, x <= 0}},
2
2
x 1 v
1 + BetaRegularized[------, -, -]
2 2 2
v + x
> ---------------------------------]
2
So my plan is to use this formula - and the math library already contains a specialized function for computing logarithms to the beta function(s).
Concerning the testing of your implementation:
- I verified that your testing functions in the files:
student-t-dist.rkt
and impl/student-t.rkt
all worked fine. This for the case of the [centered] t-Student distribution.
Thanks for testing the implementation.
- For the case of non-centered t-Student distribution, that is implied when using location parameter μ<>0, further testing may be needed. For example, we can compare the Wolfram alpha command result of: PDF[NoncentralStudentTDistribution[4,2],3.5] => 0.138586
, with the result of
((make-student-t-pdf 4 1 2) 3.5) => 0.2962962962962963`. Thank you @soegaard, for verifying if I'm interpreting correctly the used parameters.
While implementing this I have been amazed of just how many different distributions are in use in statistics. The three argument version of make-student-pdf
does not compute the noncentral Student T-distribution, but the so-called location-scale T-distribution. This matches how Mathematica interprets
StudentTDistribution[ν]
and StudentTDistribution[μ,σ,ν]
.
In[340]:= PDF[StudentTDistribution[4,1,2],3.5]
Out[340]= 0.296296
In[343]:= N[PDF[StudentTDistribution[4,1,2],35/10],30]
Out[343]= 0.296296296296296296296296296296
Note the need to use 35/10 instead of 3.5 to get full precision.
If we concentrate in the [centered ] t-Student distribution , in your code, all indicates that it is ready . Thank you for all your work in this and also in other great packages .
Great news, I'll ping you when the logcdf functionality works.
P. S. As a reference, the file oneg-noncentral-ts-dist-en.rkt , contains an experimental untyped Racket basic implementation of pdf
, cdf
, and inv-cdf
for the non-central t-Student distribution.
This is a very good start. You will need implementations of logpdf
and logcdf
too for this to work with math/distributions
.
When everything works, I'll write up the process of implementing student-t-dist
in Typed Racket for math/distributions
. I'll likely implement a chi-squared-dist
but I don't think I have energy to implement a noncentral-student-t-dist
as well.