Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Implemented
-
1.0
-
None
Description
There is a default implementation for the probability within a range computed using:
p(x0, x1) = cdf(x1) - cdf(x0)
When the CDF computes values close to 1.0 then this probability range may be more accurately evaluated using the survival function:
p(x0, x1) = sf(x0) - sf(x1)
The switch point would be if CDF(x0) is above 0.5 as accuracy may be lost verses a high-precision survival function. A solution is to cache the median for the distribution:
double probability(double x0, double x1) { if (x0 > x1) { throw new DistributionException( DistributionException.INVALID_RANGE_LOW_GT_HIGH, x0, x1); } double xm = getMedian(); return x0 < xm ? cumulativeProbability(x1) - cumulativeProbability(x0) : survivalProbability(x0) - survivalProbability(x1); }
The method can be placed in the Abstract class for the distribution which will compute and cache the median on the first invocation (using inverse CDF(0.5)). Implementations with a known median may choose to override this method to return the median directly.
Note: When there is no survival function this will reduce to the same as using the CDF:
sf = 1 - cdf sf(x0) - sf(x1) = (1 - cdf(x0)) - (1 - cdf(x1)) = cdf(x1) - cdf(x0)
There is no loss of precision with the default survival function since if cdf(x0) is > 0.5 then 1-cdf(x0) is exact and likewise for cdf(x1).