bfit.fit

Fitting Algorithms.

Module Contents

Classes

KLDivergenceSCF

Kullback-Leibler Divergence Self-Consistent Fitting.

ScipyFit

Optimize least-squares or Kullback-Leibler of Gaussian functions using Scipy.optimize.

class bfit.fit.KLDivergenceSCF(grid, density, model, mask_value=0.0, integral_dens=None, spherical=False)[source]

Bases: _BaseFit

Kullback-Leibler Divergence Self-Consistent Fitting.

This class optimizes the following objective function using self-consistent fitting method

\[\min_{\{c_i\}, \{\alpha\}} \int f(x) \log \bigg(\frac{f(x)}{\sum c_i b_i(x, \alpha_i)} \bigg)dx + \lambda(N - \sum c_i)\]

where \(f\) is the true density to be fitted, \(\lambda\) is the Lagrange multiplier, \(c_i\) is the coefficient of the ith basis function that all sum to the integral of density function \(N= \int f(x)dx\), and \(\alpha_i\) is the exponent of the basis function \(b_i\).

Construct the KLDivergenceSCF class.

Parameters
  • grid ((_BaseRadialGrid, CubicGrid)) – Grid class that contains the grid points and integration methods on them.

  • density (ndarray) – The true density evaluated on the grid points.

  • model ((AtomicGaussianDensity, MolecularGaussianDensity)) – The Gaussian basis model density. Located in model.py.

  • mask_value (float, optional) – The elements less than or equal to this number are masked in a division.

  • integral_dens (float, optional) – If this is provided, then the model is constrained to integrate to this value. If not, then the model is constrained to the numerical integration of the density. Useful when one knows the actual integration value of the density.

  • spherical (bool) – Whether to perform spherical integration by adding \(4 \pi r^2\) term to the integrand. Only used when grid is one-dimensional and positive (radial grid).

property lagrange_multiplier(self)[source]

Lagrange multiplier of Kullback-Leibler optimization problem.

run(self, c0, e0, opt_coeffs=True, opt_expons=True, maxiter=500, c_threshold=1e-06, e_threshold=1e-06, d_threshold=1e-06, disp=False)[source]

Optimize the coefficients & exponents of Gaussian basis functions self-consistently.

Parameters
  • c0 (ndarray) – The initial coefficients of Gaussian basis functions.

  • e0 (ndarray) – The initial exponents of Gaussian basis functions.

  • opt_coeffs (bool, optional) – Whether to optimize coefficients of Gaussian basis functions. Default is true.

  • opt_expons (bool, optional) – Whether to optimize exponents of Gaussian basis functions. Default is true.

  • maxiter (int, optional) – Maximum number of iterations.

  • c_threshold (float) – The termination threshold for absolute change in coefficients. Default is 1e-6.

  • e_threshold (float) – The termination threshold for absolute change in exponents. Default is 1e-6.

  • d_threshold (float) – The termination threshold for absolute change in divergence value. Default is 1e-6.

  • disp (bool) – If true, then at each iteration various error measures will be printed.

Returns

The optimization results presented as a dictionary with keys:

”coeffs”ndarray

The optimized coefficients of the Gaussian model.

”exps”ndarray

The optimized exponents of the Gaussian model.

”success”: bool

Whether the optimization exited successfully.

”fun”ndarray

Values of the KL divergence (objective function) at each iteration.

”performance”ndarray

Values of various performance measures of modeled density at each iteration, as computed by goodness_of_fit() method.

”time”float

The time in seconds it took to complete the algorithm.

Return type

dict

property grid(self)[source]

Return grid object containing points and integration method.

property density(self)[source]

Return the true density evaluated on the grid points.

property model(self)[source]

Return the Gaussian basis model density.

property measure(self)[source]

Return the deviation measure between true density and model density.

property integral_dens(self)[source]

Return integration value of the density.

property spherical(self)[source]

Whether to perform spherical integration.

integrate(self, integrand)[source]

Integrate the integrand.

Parameters

integrand (ndarray(N,)) – The integrand \(f(x)\) being integrated defined on \(N\) points. If spherical attribute is true, then the radial component is integrated in spherical coordinates.

Returns

Returns the integration \(\int f(x) dx\) or if spherical is true then returns \(\int_0^\infty f(r) 4 \pi r^2 dr\).

Return type

float

goodness_of_fit(self, coeffs, expons)[source]

Compute various measures over the grid to determine the accuracy of the fitted model.

In particular, it computes the integral of the model, the \(L_1\) distance, the \(L_\infty\) distance and attribute measure distance between true and model functions.

Parameters
  • coeffs (ndarray) – The coefficients of Gaussian basis functions.

  • expons (ndarray) – The exponents of Gaussian basis functions.

Returns

  • integral (float) – Integral of approximate model density, i.e. norm of approximate model density.

  • l_1 (float) – Integral of absolute difference between density and approximate model density. This is defined to be \(L_(f, g) = \int |f(x) - g(x)| dx\).

  • l_infinity (float) – The maximum absolute difference between density and approximate model density. This is defined to be \(L_\infty(f, g) = \max |f(x) - g(x)|\).

  • least_squares (float) – Square of the \(L_2\) norm between density and approximate model density. This is defined to be \(L_2^2(f, g) = \int (f(x) - g(x))^2 dx\).

  • kullback_leibler (float) – Kullback-Leibler divergence between density and approximate model density. This is defined to be \(KL(f, g) = \int f(x) \log\bigg(\frac{f(x)}{g(x)}\bigg)dx\). If \(g\) is negative, then this returns infinity.

class bfit.fit.ScipyFit(grid, density, model, measure=KLDivergence, method='SLSQP', weights=None, integral_dens=None, spherical=False)[source]

Bases: _BaseFit

Optimize least-squares or Kullback-Leibler of Gaussian functions using Scipy.optimize.

Least-squares objective function w.r.t. Gaussian basis-functions is defined as

\[\min_{\{c_i\}, \{\alpha\}} \int \bigg(f(x) - \sum c_i b_k(x, \alpha_i) \bigg)^2 dx\]

The Kullback-Leibler divergence function is defined as

\[\min_{\{c_i\}, \{\alpha\}, \sum c_i = N} \int f(x) \log \bigg(\frac{f(x)}{\sum c_i b_i(x, \alpha_i)} \bigg)dx,\]

where \(f\) is the density to be fitted to, \(c_i, \alpha_i\) are the coefficeints and exponents of the Gaussian basis functions, and \(N = \int f(x)dx\) is the integral of the density function.

Notes

  • Note that the Kullback-Leibler between two functions \(f\) and \(g\) is positive if and only if the integrals of \(f\) and \(g\) are identical. The with_constraint attribute must be True for optimizing Kullback-Leibler.

Construct the ScipyFit object.

Parameters
  • grid ((_BaseRadialGrid, CubicGrid)) – The grid class.

  • density (ndarray(N,)) – The true density evaluated on the grid points.

  • model ((AtomicGaussianDensity, MolecularGaussianDensity)) – The Gaussian basis model density.

  • measure (bfit.measure.Measure) – The deviation measure between true density and model density. See bfit.measure.py for examples of measures to use.

  • method (str, optional) – The method used for optimizing parameters. Default is “slsqp”. See “scipy.optimize.minimize” for options.

  • weights (ndarray, optional) – The weights of objective function at each point. If None, 1.0 is used.

  • integral_dens (float, optional) – If this is provided, then the model is constrained to integrate to this value. If not, then the model is constrained to the numerical integration of the density. Useful when one knows the actual integration value of the density.

  • spherical (bool) – Whether to perform spherical integration by adding \(4 \pi r^2\) term to the integrand. Only used when grid is one-dimensional and positive (radial grid).

run(self, c0, e0, opt_coeffs=True, opt_expons=True, maxiter=1000, tol=1e-14, disp=False, with_constraint=True)[source]

Optimize coefficients and/or exponents of Gaussian basis functions with constraint.

Parameters
  • c0 (ndarray) – Initial guess for coefficients of Gaussian basis functions.

  • e0 (ndarray) – Initial guess for exponents of Gaussian basis functions.

  • opt_coeffs (bool, optional) – Whether to optimize coefficients of Gaussian basis functions.

  • opt_expons (bool, optional) – Whether to optimize exponents of Gaussian basis functions.

  • maxiter (int, optional) – Maximum number of iterations.

  • tol (float, optional) – For slsqp. precision goal for the value of objective function in the stopping criterion. For trust-constr, it is precision goal for the change in the variables.

  • disp (bool) – If True, then it will print the iteration errors, convergence messages from the optimizer and will print out various error measures at each iteration.

  • with_constraint (bool) – If true, then adds the constraint that the integration of the model density must be equal to the constraint of true density. The default is True.

Returns

The optimization results presented as a dictionary containing:

”coeffs”ndarray

The optimized coefficients of the Gaussian model.

”exps”ndarray

The optimized exponents of the Gaussian model.

”success”: bool

Whether the optimization exited successfully.

”message”str

Information about the cause of termination.

”fun”float

Values of KL divergence (objective function) at the final iteration.

”jacobian”: ndarray

The Jacobian of the objective function w.r.t. coefficients and exponents.

”performance”list

Values of various performance measures of modeled density at each iteration, as computed by _BaseFit.goodness_of_fit method.

”time”float

The time in seconds it took to optimize.

Return type

dict

Notes

  • The coefficients and exponents are bounded to be positive.

func(self, x, *args)[source]

Compute objective function and its derivative w.r.t. Gaussian basis parameters.

Parameters
  • x (ndarray) – The parameters of Gaussian basis which is being optimized. Contains both the coefficients and exponents together in a 1-D array.

  • args – Additional arguments to the model.

Returns

The objective function value and its derivative wrt to coefficients and exponents.

Return type

(float, ndarray)

const_norm(self, x, *args)[source]

Compute deviation in normalization constraint \(\sum c_i - \int f(x) dx\).

Parameters
  • x (ndarray) – The parameters of Gaussian basis-functions. Contains both the coefficients and exponents together in a 1-D array.

  • args – Additional parameters for the model.

Returns

The deviation of the integrla with the normalization constant.

Return type

float

evaluate_model(self, x, *args)[source]

Evaluate the model density & its derivative.

Parameters
  • x (ndarray) – The parameters of Gaussian basis-functions. Contains both the coefficients and exponents together in a 1-D array.

  • args – Additional parameters for the model.

Returns

Evaluates the model density & its derivative.

Return type

float, ndarray

property grid(self)[source]

Return grid object containing points and integration method.

property density(self)[source]

Return the true density evaluated on the grid points.

property model(self)[source]

Return the Gaussian basis model density.

property measure(self)[source]

Return the deviation measure between true density and model density.

property integral_dens(self)[source]

Return integration value of the density.

property spherical(self)[source]

Whether to perform spherical integration.

integrate(self, integrand)[source]

Integrate the integrand.

Parameters

integrand (ndarray(N,)) – The integrand \(f(x)\) being integrated defined on \(N\) points. If spherical attribute is true, then the radial component is integrated in spherical coordinates.

Returns

Returns the integration \(\int f(x) dx\) or if spherical is true then returns \(\int_0^\infty f(r) 4 \pi r^2 dr\).

Return type

float

goodness_of_fit(self, coeffs, expons)[source]

Compute various measures over the grid to determine the accuracy of the fitted model.

In particular, it computes the integral of the model, the \(L_1\) distance, the \(L_\infty\) distance and attribute measure distance between true and model functions.

Parameters
  • coeffs (ndarray) – The coefficients of Gaussian basis functions.

  • expons (ndarray) – The exponents of Gaussian basis functions.

Returns

  • integral (float) – Integral of approximate model density, i.e. norm of approximate model density.

  • l_1 (float) – Integral of absolute difference between density and approximate model density. This is defined to be \(L_(f, g) = \int |f(x) - g(x)| dx\).

  • l_infinity (float) – The maximum absolute difference between density and approximate model density. This is defined to be \(L_\infty(f, g) = \max |f(x) - g(x)|\).

  • least_squares (float) – Square of the \(L_2\) norm between density and approximate model density. This is defined to be \(L_2^2(f, g) = \int (f(x) - g(x))^2 dx\).

  • kullback_leibler (float) – Kullback-Leibler divergence between density and approximate model density. This is defined to be \(KL(f, g) = \int f(x) \log\bigg(\frac{f(x)}{g(x)}\bigg)dx\). If \(g\) is negative, then this returns infinity.