Could the t-Student distribution be included in the math module?

Hello math module friends:

Could it be possible to include some functions for the t-Student distribution in the math module?

May be for the moment, starting with the centered version.
An example (very basic and possible limited) implementation of the pdf, cdf & inv-cdf functions for the centered t-Student distribution is available at oneg-t-student-en.rkt .

Thank you and congratulations for your great and very useful math package.
Kind regards,
ECB
P.S. The implementation of the non-central t-Student distribution, will be very welcome also, because of the many important applications that make use of it, in particular the computation of the power in hypothesis testing when the population variance is unknown. For more information on applications, you can explore the article by F.-W. Scholz entitled Applications of the Noncentral t-Distribution. Thank you in advance for your support.

1 Like

A great question; I asked about it a month or two ago in Discord, and it seems like the building blocks are there but need put together. I’ve been meaning to try my hand at it but this is not a domain I have enough expertise to judge well.

Thank you @benknoble for your comment. As it says in https://github.com/racket/racket/wiki/Math-Library-Features, for the case of chisqr-dist: "(easy wrapper around gamma-dist)". A similar comment (using the beta functions) can be made concerning the F-dist, and t-dist.

For example, my implementation of the ts-dist (for the pdf and cdf, respectively) in the file oneg-t-student-en.rkt just follows the equations (1) and (2) in the article by the Idaho National Lab: Student t-distribution. The code is short, thanks to the developers of the math module, because it already includes the beta and beta-inc special functions. For the inv-cdf function, I just used the secant method for finding the unique root of the required equation.

Concerning the validation of the computed values, we can use tables like the one at the NIST's Engineering Statistics Handbook section 1.3.6.7.2. Critical Values of the Student's t Distribution, or alternatively some other software packages.

It feels good to do (effectively) this kind of computations in Racket. I'm sure that including these classical distribution (and others) in the math module, will increase considerable the academic (and may be, industrial) interest and use of the great Racket language.

Kind regards,
ECB

Take a look at my t-test package in pkgs; if yours is better, let me know and I'll replace it. Also, if there's something I can do to aid in discoverability, I'd be interested in that, too.

discoverability

@soegaard posted a version on Discord (https://discord.com/channels/571040468092321801/1258477662813294763) that integrates directly with the math library, which is probably the ideal version of discoverable.

Short that, providing an extension of the math library (same name conventions, types, etc.,) would probably go a long way.

It now seems I'll have 3 implementations to try + Welch, so I'm at least grateful for all the effort folks have put into this.

I have implemented student-t-dist for inclusion in math/distributions.

The calls (student-t-dist μ) and (student-t-dist μ σ ν) return
an ordered-distribution representing a Student's t-distribution.

The parameters are as follows:

    μ   - location parameter
    σ   - scale parameter
    ν   - degrees of freedom

As with all distributions in the math library, you can use pdf, cdf, inv-cdf and sample on the distribution.

I would appreciate that someone besides me gave the implementation a test run.
Before the inclusion in math/distribution the tests need to move.
Until then, open one of the files in your editor, uncomment the test suite and run.

Clone the student-t branch from soegaard/math below.

The lowe-level functions are here:

They are packed in a student-t-dist struct here:

For now the flags log? and 1-p? are ignored.
Besides those everything is expected to work.

UPDATE

Now the flags log? and 1-p? are implemented.

The idea behind log? is to work in log-space. That is, instead of returning p one computes log(p). For small probabilities like (pdf X 40) ~0.0 where X is normal distributed with mean 0 and sigma 1, the log-probability gives a little extra precision.

However, I do not know how to compute log(p) without computing p first.
So for now, I simply compute p first and return log(p).

UPDATE

I figured out to compute log(pdf(x)) directly without computing pdf(x) first.

Does anyone know how to compute log(cdf(x)) directly without computing cdf(x) first?

1 Like

Thank you @jbclements for your t-test package, and especially for the illuminating comments at the start of its documentation.

Also thank you @benknoble, for pointing to the work of @soegaard concerning the t-Student distribution. There is much to learn from his implementation.

I will try to follow @soegaard's proposal to test his functions, particularly: pdf, cdf, & inv-cdf, even though I don't have any previous experience cloning branches of code. BTW, it's amazing how much work is required to be able to enjoy all the functionality in the math module: it is greatly appreciated.

Best regards.

Clone the whole repo. Then use git checkout student-t to switch to the student-t branch.

Thank you @soegaard for your advice concerning cloning your t-student distribution implementation.

For the computation of log(cdf(x)), may be equation (9) in the following PDF document: Some Inferential Problems from Log Student’s T-distribution and its Multivariate Extension, can be helpful, as a reference.

Also, the paper: Rethinking Generalized Beta Family of Distributions, may be useful, because in the above document it says that: the "log t distribution is a special case of the generalized beta distribution" (see paragraph just before equation (8)).

Concerning the testing of your implementation:

  1. I verified that your testing functions in the files: student-t-dist.rkt and impl/student-t.rkt all worked fine. This for the case of the [centered] t-Student distribution.

  2. For the case of non-centered t-Student distribution, that is implied when using location parameter μ<>0, further testing may be needed. For example, we can compare the Wolfram alpha command result of: PDF[NoncentralStudentTDistribution[4,2],3.5] => 0.138586, with the result of ((make-student-t-pdf 4 1 2) 3.5) => 0.2962962962962963. Thank you @soegaard, for verifying if I'm interpreting correctly the used parameters.

If we concentrate in the [centered] t-Student distribution, in your code, all indicates that it is ready. Thank you for all your work in this and also in other great packages.

Kind regards.

P. S. As a reference, the file oneg-noncentral-ts-dist-en.rkt, contains an experimental untyped Racket basic implementation of pdf, cdf, and inv-cdf for the non-central t-Student distribution.

1 Like

For the computation of log(cdf(x)) , may be equation (9) in the following PDF document: Some Inferential Problems from Log Student’s T-distribution and its Multivariate Extension, can be helpful, as a reference.

Equation (9) is about the "Log Student’s T-distribution", so it is not applicable to computing the logarithm of the cdf for a standard T-distribution.

But I think, I have found the piece of the puzzle, I was missing.
The formula for cdf(x) in Wikipedia and elsewhere are most often stated for x>0.
The expectation is that one can use P(X≤x) = P(X≥-x) = 1 - P(X≤-x) for negative x. However, since we are after log(P(X≤x)) we get stuck with log(1 - P(X≤-x)).

Other sources such as "Sampling Student’s T distribution – use of the inverse cumulative distribution function" by William T. Shaw features this formula:

which shows how to get a formula for x<0.

Asking wolframscript about the CDF for a StudentTDistribution, we get the same result as Shaw:

In[339]:= CDF[StudentTDistribution[v], x]                          

                                        v     v  1
                      BetaRegularized[------, -, -]
                                           2  2  2
                                      v + x
Out[339]= Piecewise[{{-----------------------------, x <= 0}}, 
                                    2
 
                            2
                           x     1  v
     1 + BetaRegularized[------, -, -]
                              2  2  2
                         v + x
>    ---------------------------------]
                     2

So my plan is to use this formula - and the math library already contains a specialized function for computing logarithms to the beta function(s).

Concerning the testing of your implementation:

  1. I verified that your testing functions in the files: student-t-dist.rkt and impl/student-t.rkt all worked fine. This for the case of the [centered] t-Student distribution.

Thanks for testing the implementation.

  1. For the case of non-centered t-Student distribution, that is implied when using location parameter μ<>0, further testing may be needed. For example, we can compare the Wolfram alpha command result of: PDF[NoncentralStudentTDistribution[4,2],3.5] => 0.138586, with the result of ((make-student-t-pdf 4 1 2) 3.5) => 0.2962962962962963`. Thank you @soegaard, for verifying if I'm interpreting correctly the used parameters.

While implementing this I have been amazed of just how many different distributions are in use in statistics. The three argument version of make-student-pdf does not compute the noncentral Student T-distribution, but the so-called location-scale T-distribution. This matches how Mathematica interprets
StudentTDistribution[ν] and StudentTDistribution[μ,σ,ν].

In[340]:= PDF[StudentTDistribution[4,1,2],3.5]                     

Out[340]= 0.296296

In[343]:= N[PDF[StudentTDistribution[4,1,2],35/10],30]             

Out[343]= 0.296296296296296296296296296296

Note the need to use 35/10 instead of 3.5 to get full precision.

If we concentrate in the [centered ] t-Student distribution , in your code, all indicates that it is ready . Thank you for all your work in this and also in other great packages .

Great news, I'll ping you when the logcdf functionality works.

P. S. As a reference, the file oneg-noncentral-ts-dist-en.rkt , contains an experimental untyped Racket basic implementation of pdf , cdf , and inv-cdf for the non-central t-Student distribution.

This is a very good start. You will need implementations of logpdf and logcdf too for this to work with math/distributions.

When everything works, I'll write up the process of implementing student-t-dist in Typed Racket for math/distributions. I'll likely implement a chi-squared-dist but I don't think I have energy to implement a noncentral-student-t-dist as well.

Thank you @soegaard for clarifying the context of the equation (9) in the reference above. My apologies for the misinterpretation.

Concerning expressions of the form log(1-x) it seems that the function log1p(x) (already available in Racket) where log1p(x)=log(1+x) can be use it in this context. That is:
log(1 - P(X≤-x))=log(1 + (-P(X≤-x)))
=log1p(-P(X≤-x)),
where -P(X≤-x)>-1.
(Update) I understand that the above function, is mainly used when we need to evaluate with high precision log(1-x), for very small values of x.

Thank you for calling our attention to the work of William T. Shaw in this context. There is much to learn from: Sampling Student's T distribution (e.g. 1st paragraph after eq'n (24), related to inv-cdf).

I appreciate very much your guidance and example to keep working in the implementation of logpdf and logcdf function, for the noncentral t-Student case.

Thank you @soegaard, for the previous technical information.
About the logcdf() function for the centered t-Student, please (if possible) you can check the following code, in order to see if it is what is required:

;; auxiliary function.
;; notice the selection of Re(x)
(define (log1p x)
  (fllog1p (real->double-flonum
            (real-part x))))

;; log(cdf(x)) for t-student
;; with ν degrees of freedom
(define (make-logcdf-ts ν)
  (λ(x)
    (define (Δlogsβ x) ; invariant with sgn(x)
      (- (log (beta-inc
               (/ ν 2)
               1/2
               (/ ν
                  (+ ν (sqr x)))))
         (log (beta
               (/ ν 2)
               1/2))))
    (cond
      ((< x 0)
       (+ (log 1/2)
          (Δlogsβ (- x))))
      ((= x 0.5)
       (log 0.5))
      (else
       (log1p
        (exp
         (+ (log -1/2) ;; complex
            (Δlogsβ x))))))))

I already tested with respect to log(cdf(x)) and it (seems to) works fine.
For example:

((make-logcdf-ts 4) -10) ; => -8.177149442537441
(log (cdf-ts 4 -10)) ; => -8.177149442537285

The basic idea is that, for (x>0), {B_x=Beta(\frac{\nu}{2},\frac{1}{2},\frac{\nu}{\nu+x^2})} and {B=Beta(\frac{\nu}{2},\frac{1}{2})}, we have:
{\mathsf{log} F(x;\nu)=\mathsf{log}(1+(-\frac{1}{2}\frac{B_x}{B}))}
{\mathsf{log} F(x;\nu)=\mathsf{log1p}(\mathsf{exp}(\mathsf{log}(-\frac{1}{2})+\mathsf{log} B_{x} - \mathsf{log}B))}.
For the case of (x<0) we have simply:
\mathsf{log} F(x;\nu)=\mathsf{log}(\frac{1}{2})+\mathsf{log} B_{-x}-\log B.
For (x=0) we have \mathsf{log} F(0;\nu)=\mathsf{log}(\frac{1}{2}).

Notice that for the log1p(x) function to work properly in this case, it was necessary to take only the real part of x.

Kind regards.

I checked out the current implementation of @soegaard's pdf for one parameter. I converted the fl implementation to a bf implementation to check fhe flulp-error:


Red: flulp-error > 200; white: flulp-error> 200 & result=0
grey lines are at nu=1 & x=1

also near nu=1; x=1 (1e-3 - 1e3):

I managed to improve some things, mainly rearanging and using flexpt+:


The graphs need to be taken with a grain of salt, since I'm not 100% confident in my bf implementation, especially for large nu / x.

I also noticed that for (beta 1/2 (/ nu 2)) the errors can become quite big when nu is an integer (observed 1000 ulp for nu < 5000). Since this is probably the main use case (integer nu), maybe it is better to use the fct below, but probably using a table somewhere to speed things up:

(: beta1/2 (-> Flonum Flonum))
(define (beta1/2 a)
  (if (and (< 1 a 10000) (integer? a))
      (if (fleven? a)
          (let ([a (- (exact-round a) 1)])
            (fl (* 2
                   (let lp : Real ([a : Integer a][c : Real 1])
                     (cond
                       [(< a 2) c]
                       [else (lp (- a 2) (* c (/ (- a 1) a)))])))))
          (let ([a (- (exact-round a) 1)])
            (fl (* pi
                   (let lp : Real ([a : Integer a][c : Real 1])
                     (cond
                       [(< a 2) c]
                       [else (lp (- a 2) (* c (/ (- a 1) a)))]))))))
      (beta 0.5 (/ a 2.))))

I don't know what is seen as acceptable accuracy for the distributions, so maybe this all is moot.

Thanks for the tip wrt. log1p and for pointing out the section in Shaw about inverse CDF.

@encomer Thanks for the code for logcdf.

We can use fllog-beta-inc and fllog-beta to compute (log (beta-inc ...))
and (log (beta ...)) respectively.

I wish there were a way to compute the case x>0 without using log1p combined with (exp ...). It just looks odd for me to see (log1p (exp ...)).

@bdeket Great idea to compare with a big float implementation.

For small, integer nu there are exact formulas for the probability density [1].
Should we try these?

I don't know what is seen as acceptable accuracy for the distributions, so maybe this all is moot.

I think, we should use your version. Reducing the error as much as possible must be
the right thing to do.

Will you make a PR to GitHub - soegaard/math at student-t ?

Do you have the big float version on your Github?

[1] From Student's t-distribution - Wikipedia

UPDATE

I got ChatGPT to rewrite the formulas in the table to Racket-expressions.

(define pdf-list
  (list
    ; ν = 1
    (/ 1 (* pi (+ 1 (expt t 2))))
    
    ; ν = 2
    (/ 1 (* 2 (sqrt 2) (expt (+ 1 (/ (expt t 2) 2)) (/ 3 2))))
    
    ; ν = 3
    (/ 2 (* pi (sqrt 3) (expt (+ 1 (/ (expt t 2) 3)) 2)))
    
    ; ν = 4
    (/ 3 (* 8 (expt (+ 1 (/ (expt t 2) 4)) (/ 5 2))))
    
    ; ν = 5
    (/ 8 (* 3 pi (sqrt 5) (expt (+ 1 (/ (expt t 2) 5)) 3)))
    
    ; ν = ∞
    (* (/ 1 (sqrt (* 2 pi))) (exp (/ (expt t 2) -2)))
  ))

and

(define cdf-list
  (list
    ; ν = 1
    (+ (/ 1 2) (/ 1 pi) (atan t))
    
    ; ν = 2
    (+ (/ 1 2) (/ t (* 2 (sqrt 2) (sqrt (+ 1 (/ (expt t 2) 2))))))
    
    ; ν = 3
    (+ (/ 1 2) (/ 1 pi) (+ (/ (/ t (sqrt 3)) (+ 1 (/ (expt t 2) 3))) (atan (/ t (sqrt 3)))))
    
    ; ν = 4
    (+ (/ 1 2) (/ 3 8) (* (/ t (sqrt (+ 1 (/ (expt t 2) 4)))) (- 1 (/ (expt t 2) (* 12 (+ 1 (/ (expt t 2) 4)))))))
    
    ; ν = 5
    (+ (/ 1 2) (/ 1 pi) (+ (/ (/ t (sqrt 5)) (* (+ 1 (/ (expt t 2) 5)) (+ 1 (/ 2 (* 3 (+ 1 (/ (expt t 2) 5))))))) (atan (/ t (sqrt 5)))))
    
    ; ν = ∞
    (* (/ 1 2) (+ 1 (erf (/ t (sqrt 2)))))
  ))

Hi @soegaard, I created a gist of the bf implementation: bf-student-t · GitHub

for now I only seriously used the make-pdf so I don't know how well the other functions work.
To be valid in the complete range of flonums bf-precision will be need set to at least 1024. This is especially true if nu > 1e30

The gist above now includes more testing code. I was trying to also improve the cdf, and although my current version is a lot better (no early 0 in a lot of the flonum space) it still has a significant region near x=0.1~100 and nu=1~1e16 were improvements would be welcome: