Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Blog
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Nuget packages
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference
  • Extreme Optimization
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Deployment Guide
    • Nuget packages
    • Configuration
    • Using Parallelism
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Data Access Library User's Guide
    • Reference
  • Statistics Library User's Guide
    • Statistical Variables
    • Numerical Variables
    • Statistical Models
    • Regression Analysis
    • Analysis of Variance
    • Time Series Analysis
    • Multivariate Analysis
    • Continuous Distributions
    • Discrete Distributions
    • Multivariate Distributions
    • Kernel Density Estimation
    • Hypothesis Tests
    • Appendices
  • Kernel Density Estimation

Kernel Density Estimation

Extreme Optimization Numerical Libraries for .NET Professional

Kernel density estimation (KDE) is a method for estimating the probability density function of a variable. The estimated distribution is taken to be the sum of appropriately scaled and positioned kernels. The bandwidth specifies how far out each observation affects the density estimate.

Kernel density estimation is implemented by the KernelDensity class.

In the code examples, we will repeatedly use a sample generated from a mixture of two Gaussian distributions:

C#
VB
C++
F#
Copy
var norm1 = new NormalDistribution(-1, 1);
var norm2 = new NormalDistribution(1, 0.3);
var X = Vector.Join(norm1.Sample(400), norm2.Sample(100));
Dim norm1 = New NormalDistribution(-1, 1)
Dim norm2 = New NormalDistribution(1, 0.3)
Dim X = Vector.Join(norm1.Sample(400), norm2.Sample(100))

No code example is currently available or this language may not be supported.

let norm1 = NormalDistribution(-1.0, 1.0)
let norm2 = NormalDistribution(1.0, 0.3)
let X = Vector.Join(norm1.Sample(400), norm2.Sample(100))
Kernels

A kernel is a non-negative function with mean 0 and area 1. Kernels are implemented by the Kernel class. The KernelDensity class provides several fields that represent common kernels. The available kernels are listed below:

Field

Equation

Chart

UniformKernel

Uniform kernel

TriangularKernel

Triangular kernel

GaussianKernel

Gaussian kernel

EpanechnikovKernel

Epanechnikov kernel

BiweightKernel

Biweight kernel

TriweightKernel

Triweight kernel

TricubicKernel

kde tricubic

CosineKernel

Cosine kernel

CosineKernel2

Alternate cosine kernel

LogisticKernel

kde logistic
Bandwidth

The bandwidth is a parameter in the kernel density estimation that indicates how far the influence of each observation reaches in the density estimate. It is important to get a good value. When the bandwidth is too large, some important features of the true density may be missed. If the bandwidth is too small, the estimated density will be very noisy.

The bandwidth can be supplied directly to the kernel estimation method, or it can be estimated automatically. Three methods are available, enumerated by the KernelDensityBandwidthEstimator enumeration:

Value

Description

NormalReference

The bandwidth is chosen so it minimizes the integrated square error for normal data.

Silverman

Use Silverman's rule of thumb.

Scott

Use Scott's rule of thumb.

The EstimateBandwidth(VectorDouble, Kernel, KernelDensityBandwidthEstimator) method returns an estimate of the bandwidth for the specified input. It takes three arguments. The first is a VectorT that specifies the data on which the density estimate will be based. The second argument is the kernel. The third argument is a KernelDensityBandwidthEstimator value that specifies the estimation method.

Methods exist that estimate the bandwidth for each of the three techniques. The KernelDensity class has several methods that allow you to estimate the bandwidth. The SilvermanBandwidth(VectorDouble) and ScottBandwidth(VectorDouble) methods return the bandwidth using Silverman's and Scott's rule of thumb, respectively. These methods take one argument: a VectorT that specifies the data on which the density estimate will be based. The NormalReferenceBandwidth(VectorDouble, Kernel) method returns the normal reference bandwidth. It takes two arguments: a VectorT that specifies the data on which the density estimate will be based, and the kernel.

In the code below, we compute the normal reference bandwidth for our sample for a Gaussian kernel. We also compute the bandwidth using Silverman's rule of thumb:

C#
VB
C++
F#
Copy
var bwRef = KernelDensity.NormalReferenceBandwidth(X, KernelDensity.GaussianKernel);
var bwSilverman = KernelDensity.EstimateBandwidth(X, KernelDensity.GaussianKernel,
    KernelDensityBandwidthEstimator.Silverman);
Dim bwRef = KernelDensity.NormalReferenceBandwidth(X, KernelDensity.GaussianKernel)
Dim bwSilverman = KernelDensity.EstimateBandwidth(X, KernelDensity.GaussianKernel,
    KernelDensityBandwidthEstimator.Silverman)

No code example is currently available or this language may not be supported.

let bwRef = KernelDensity.NormalReferenceBandwidth(
              X, KernelDensity.GaussianKernel)
let bwSilverman = KernelDensity.EstimateBandwidth(X, KernelDensity.GaussianKernel,
                    KernelDensityBandwidthEstimator.Silverman)
Computing Kernel Density Estimates

The Estimate method computes the estimated density for one value or a range of values. This method takes up to 5 arguments. The first is a VectorT that contains the observations for which the density is to be estimated. The second argument is the kernel. The third argument is the value at which to evaluate the density. If a scalar is supplied, then the density at this value is returned. If a vector is supplied, then a vector of the densities at each value of the vector is returned.

The remaining arguments are all optional. The fourth argument is the bandwidth. If omitted, the bandwidth is estimated using the method specified by the KernelDensityBandwidthEstimator value passed as the fifth argument. The default is to use the normal reference bandwidth. The final argument is an adjustment factor for the bandwidth. This is useful when you want to specify the bandwidth as a fraction of an estimated bandwidth. Both these arguments are ignored if the bandwidth was provided explicitly.

In the next example, we compute three different kernel density estimates. First, we use a Gaussian kernel and use the Silverman bandwidth we found earlier. Then we use an Epanechnikov kernel using Scott's rule to get the bandwidth. Finally, we use a tri-weight kernel and for the bandwidth we use half the normal reference bandwidth:

C#
VB
C++
F#
Copy
var density1 = KernelDensity.Estimate(X, KernelDensity.GaussianKernel, bwSilverman);
var density2 = KernelDensity.Estimate(X, KernelDensity.EpanechnikovKernel,
        bandwidthEstimator: KernelDensityBandwidthEstimator.Scott);
var density3 = KernelDensity.Estimate(X, KernelDensity.TriweightKernel,
        bandwidthAdjustment: 0.5);
Dim density1 = KernelDensity.Estimate(X, KernelDensity.GaussianKernel, bwSilverman)
Dim density2 = KernelDensity.Estimate(X, KernelDensity.EpanechnikovKernel,
        bandwidthEstimator:=KernelDensityBandwidthEstimator.Scott)
Dim density3 = KernelDensity.Estimate(X, KernelDensity.TriweightKernel,
        bandwidthAdjustment:=0.5)

No code example is currently available or this language may not be supported.

let density1 = KernelDensity.Estimate(X, KernelDensity.GaussianKernel, bwSilverman)
let density2 = KernelDensity.Estimate(X, KernelDensity.EpanechnikovKernel,
                bandwidthEstimator=KernelDensityBandwidthEstimator.Scott)
let density3 = KernelDensity.Estimate(X, KernelDensity.TriweightKernel,
                bandwidthAdjustment=0.5)

The EstimateDistribution method performs the same operation, but returns a ContinuousDistribution whose probability density function (PDF) is equal to the estimated density. The arguments to this method are mostly the same as before. The first argument is once again a vector of observations. The second argument is the kernel. The third through fifth arguments concern the bandwidth: the explicit bandwidth, the estimation method, and the adjustment, respectively.

The sixth argument specifies the number of points to use in the approximation of the density. If omitted, the smaller of 200 and the number of inputs is used. The seventh argument specifies how far past the lowest and highest observation the approximation should be computed, in units of the bandwidth. The default is 3.

In our last example, we estimate the density of our sample using a Gaussian kernel. The estimate is returned as a probability distribution, which we can then use for other purposes, such as drawing more samples and computing expectation values:

C#
VB
C++
F#
Copy
var dist1 = KernelDensity.EstimateDistribution(X, KernelDensity.GaussianKernel);
var moreSamples = dist1.Sample(100);
var expectionValue = dist1.GetExpectationValue(x => Math.Exp(x));
Dim dist1 = KernelDensity.EstimateDistribution(X, KernelDensity.GaussianKernel)
Dim moreSamples = dist1.Sample(100)
Dim expectionValue = dist1.GetExpectationValue(Function(z) Math.Exp(z))

No code example is currently available or this language may not be supported.

let dist1 = KernelDensity.EstimateDistribution(X, KernelDensity.GaussianKernel)
let moreSamples = dist1.Sample(100)
let expectionValue = dist1.GetExpectationValue(fun x -> exp x)

Copyright (c) 2004-2021 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2021, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.