[STATISTICS-40] Better probability(x0, x1) function to use the survival probability - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Implemented
Affects Version/s: 1.0
Fix Version/s: 1.0
Component/s: distribution
Labels:
None

Description

There is a default implementation for the probability within a range computed using:

p(x0, x1) = cdf(x1) - cdf(x0)

When the CDF computes values close to 1.0 then this probability range may be more accurately evaluated using the survival function:

p(x0, x1) = sf(x0) - sf(x1)

The switch point would be if CDF(x0) is above 0.5 as accuracy may be lost verses a high-precision survival function. A solution is to cache the median for the distribution:

double probability(double x0, double x1) {
    if (x0 > x1) {
        throw new DistributionException(
            DistributionException.INVALID_RANGE_LOW_GT_HIGH, x0, x1);
    }
    double xm = getMedian();
    return x0 < xm ?
        cumulativeProbability(x1) - cumulativeProbability(x0) :
        survivalProbability(x0) - survivalProbability(x1);
 }

The method can be placed in the Abstract class for the distribution which will compute and cache the median on the first invocation (using inverse CDF(0.5)). Implementations with a known median may choose to override this method to return the median directly.

Note: When there is no survival function this will reduce to the same as using the CDF:

sf = 1 - cdf

sf(x0) - sf(x1) = (1 - cdf(x0)) - (1 - cdf(x1))
                = cdf(x1) - cdf(x0)

There is no loss of precision with the default survival function since if cdf(x0) is > 0.5 then 1-cdf(x0) is exact and likewise for cdf(x1).

Attachments

Activity

People

Assignee:: Alex Herbert

Reporter:: Alex Herbert

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Oct/21 18:38

Updated:: 06/Dec/22 11:30

Resolved:: 16/Oct/21 15:22