bfit.fit
¶
Fitting Algorithms.
Module Contents¶
Classes¶
Kullback-Leibler Divergence Self-Consistent Fitting. |
|
Optimize least-squares or Kullback-Leibler of Gaussian functions using Scipy.optimize. |
-
class
bfit.fit.
KLDivergenceSCF
(grid, density, model, mask_value=0.0, integral_dens=None, spherical=False)[source]¶ Bases:
_BaseFit
Kullback-Leibler Divergence Self-Consistent Fitting.
This class optimizes the following objective function using self-consistent fitting method
\[\min_{\{c_i\}, \{\alpha\}} \int f(x) \log \bigg(\frac{f(x)}{\sum c_i b_i(x, \alpha_i)} \bigg)dx + \lambda(N - \sum c_i)\]where \(f\) is the true density to be fitted, \(\lambda\) is the Lagrange multiplier, \(c_i\) is the coefficient of the ith basis function that all sum to the integral of density function \(N= \int f(x)dx\), and \(\alpha_i\) is the exponent of the basis function \(b_i\).
Construct the KLDivergenceSCF class.
- Parameters
grid ((_BaseRadialGrid, CubicGrid)) – Grid class that contains the grid points and integration methods on them.
density (ndarray) – The true density evaluated on the grid points.
model ((AtomicGaussianDensity, MolecularGaussianDensity)) – The Gaussian basis model density. Located in model.py.
mask_value (float, optional) – The elements less than or equal to this number are masked in a division.
integral_dens (float, optional) – If this is provided, then the model is constrained to integrate to this value. If not, then the model is constrained to the numerical integration of the density. Useful when one knows the actual integration value of the density.
spherical (bool) – Whether to perform spherical integration by adding \(4 \pi r^2\) term to the integrand. Only used when grid is one-dimensional and positive (radial grid).
-
property
lagrange_multiplier
(self)[source]¶ Lagrange multiplier of Kullback-Leibler optimization problem.
-
run
(self, c0, e0, opt_coeffs=True, opt_expons=True, maxiter=500, c_threshold=1e-06, e_threshold=1e-06, d_threshold=1e-06, disp=False)[source]¶ Optimize the coefficients & exponents of Gaussian basis functions self-consistently.
- Parameters
c0 (ndarray) – The initial coefficients of Gaussian basis functions.
e0 (ndarray) – The initial exponents of Gaussian basis functions.
opt_coeffs (bool, optional) – Whether to optimize coefficients of Gaussian basis functions. Default is true.
opt_expons (bool, optional) – Whether to optimize exponents of Gaussian basis functions. Default is true.
maxiter (int, optional) – Maximum number of iterations.
c_threshold (float) – The termination threshold for absolute change in coefficients. Default is 1e-6.
e_threshold (float) – The termination threshold for absolute change in exponents. Default is 1e-6.
d_threshold (float) – The termination threshold for absolute change in divergence value. Default is 1e-6.
disp (bool) – If true, then at each iteration various error measures will be printed.
- Returns
The optimization results presented as a dictionary with keys:
- ”coeffs”ndarray
The optimized coefficients of the Gaussian model.
- ”exps”ndarray
The optimized exponents of the Gaussian model.
- ”success”: bool
Whether the optimization exited successfully.
- ”fun”ndarray
Values of the KL divergence (objective function) at each iteration.
- ”performance”ndarray
Values of various performance measures of modeled density at each iteration, as computed by goodness_of_fit() method.
- ”time”float
The time in seconds it took to complete the algorithm.
- Return type
dict
-
property
measure
(self)[source]¶ Return the deviation measure between true density and model density.
-
integrate
(self, integrand)[source]¶ Integrate the integrand.
- Parameters
integrand (ndarray(N,)) – The integrand \(f(x)\) being integrated defined on \(N\) points. If spherical attribute is true, then the radial component is integrated in spherical coordinates.
- Returns
Returns the integration \(\int f(x) dx\) or if spherical is true then returns \(\int_0^\infty f(r) 4 \pi r^2 dr\).
- Return type
float
-
goodness_of_fit
(self, coeffs, expons)[source]¶ Compute various measures over the grid to determine the accuracy of the fitted model.
In particular, it computes the integral of the model, the \(L_1\) distance, the \(L_\infty\) distance and attribute measure distance between true and model functions.
- Parameters
coeffs (ndarray) – The coefficients of Gaussian basis functions.
expons (ndarray) – The exponents of Gaussian basis functions.
- Returns
integral (float) – Integral of approximate model density, i.e. norm of approximate model density.
l_1 (float) – Integral of absolute difference between density and approximate model density. This is defined to be \(L_(f, g) = \int |f(x) - g(x)| dx\).
l_infinity (float) – The maximum absolute difference between density and approximate model density. This is defined to be \(L_\infty(f, g) = \max |f(x) - g(x)|\).
least_squares (float) – Square of the \(L_2\) norm between density and approximate model density. This is defined to be \(L_2^2(f, g) = \int (f(x) - g(x))^2 dx\).
kullback_leibler (float) – Kullback-Leibler divergence between density and approximate model density. This is defined to be \(KL(f, g) = \int f(x) \log\bigg(\frac{f(x)}{g(x)}\bigg)dx\). If \(g\) is negative, then this returns infinity.
-
class
bfit.fit.
ScipyFit
(grid, density, model, measure=KLDivergence, method='SLSQP', weights=None, integral_dens=None, spherical=False)[source]¶ Bases:
_BaseFit
Optimize least-squares or Kullback-Leibler of Gaussian functions using Scipy.optimize.
Least-squares objective function w.r.t. Gaussian basis-functions is defined as
\[\min_{\{c_i\}, \{\alpha\}} \int \bigg(f(x) - \sum c_i b_k(x, \alpha_i) \bigg)^2 dx\]The Kullback-Leibler divergence function is defined as
\[\min_{\{c_i\}, \{\alpha\}, \sum c_i = N} \int f(x) \log \bigg(\frac{f(x)}{\sum c_i b_i(x, \alpha_i)} \bigg)dx,\]where \(f\) is the density to be fitted to, \(c_i, \alpha_i\) are the coefficeints and exponents of the Gaussian basis functions, and \(N = \int f(x)dx\) is the integral of the density function.
Notes
Note that the Kullback-Leibler between two functions \(f\) and \(g\) is positive if and only if the integrals of \(f\) and \(g\) are identical. The with_constraint attribute must be True for optimizing Kullback-Leibler.
Construct the ScipyFit object.
- Parameters
grid ((_BaseRadialGrid, CubicGrid)) – The grid class.
density (ndarray(N,)) – The true density evaluated on the grid points.
model ((AtomicGaussianDensity, MolecularGaussianDensity)) – The Gaussian basis model density.
measure (bfit.measure.Measure) – The deviation measure between true density and model density. See bfit.measure.py for examples of measures to use.
method (str, optional) – The method used for optimizing parameters. Default is “slsqp”. See “scipy.optimize.minimize” for options.
weights (ndarray, optional) – The weights of objective function at each point. If None, 1.0 is used.
integral_dens (float, optional) – If this is provided, then the model is constrained to integrate to this value. If not, then the model is constrained to the numerical integration of the density. Useful when one knows the actual integration value of the density.
spherical (bool) – Whether to perform spherical integration by adding \(4 \pi r^2\) term to the integrand. Only used when grid is one-dimensional and positive (radial grid).
-
run
(self, c0, e0, opt_coeffs=True, opt_expons=True, maxiter=1000, tol=1e-14, disp=False, with_constraint=True)[source]¶ Optimize coefficients and/or exponents of Gaussian basis functions with constraint.
- Parameters
c0 (ndarray) – Initial guess for coefficients of Gaussian basis functions.
e0 (ndarray) – Initial guess for exponents of Gaussian basis functions.
opt_coeffs (bool, optional) – Whether to optimize coefficients of Gaussian basis functions.
opt_expons (bool, optional) – Whether to optimize exponents of Gaussian basis functions.
maxiter (int, optional) – Maximum number of iterations.
tol (float, optional) – For slsqp. precision goal for the value of objective function in the stopping criterion. For trust-constr, it is precision goal for the change in the variables.
disp (bool) – If True, then it will print the iteration errors, convergence messages from the optimizer and will print out various error measures at each iteration.
with_constraint (bool) – If true, then adds the constraint that the integration of the model density must be equal to the constraint of true density. The default is True.
- Returns
The optimization results presented as a dictionary containing:
- ”coeffs”ndarray
The optimized coefficients of the Gaussian model.
- ”exps”ndarray
The optimized exponents of the Gaussian model.
- ”success”: bool
Whether the optimization exited successfully.
- ”message”str
Information about the cause of termination.
- ”fun”float
Values of KL divergence (objective function) at the final iteration.
- ”jacobian”: ndarray
The Jacobian of the objective function w.r.t. coefficients and exponents.
- ”performance”list
Values of various performance measures of modeled density at each iteration, as computed by _BaseFit.goodness_of_fit method.
- ”time”float
The time in seconds it took to optimize.
- Return type
dict
Notes
The coefficients and exponents are bounded to be positive.
-
func
(self, x, *args)[source]¶ Compute objective function and its derivative w.r.t. Gaussian basis parameters.
- Parameters
x (ndarray) – The parameters of Gaussian basis which is being optimized. Contains both the coefficients and exponents together in a 1-D array.
args – Additional arguments to the model.
- Returns
The objective function value and its derivative wrt to coefficients and exponents.
- Return type
(float, ndarray)
-
const_norm
(self, x, *args)[source]¶ Compute deviation in normalization constraint \(\sum c_i - \int f(x) dx\).
- Parameters
x (ndarray) – The parameters of Gaussian basis-functions. Contains both the coefficients and exponents together in a 1-D array.
args – Additional parameters for the model.
- Returns
The deviation of the integrla with the normalization constant.
- Return type
float
-
evaluate_model
(self, x, *args)[source]¶ Evaluate the model density & its derivative.
- Parameters
x (ndarray) – The parameters of Gaussian basis-functions. Contains both the coefficients and exponents together in a 1-D array.
args – Additional parameters for the model.
- Returns
Evaluates the model density & its derivative.
- Return type
float, ndarray
-
property
measure
(self)[source]¶ Return the deviation measure between true density and model density.
-
integrate
(self, integrand)[source]¶ Integrate the integrand.
- Parameters
integrand (ndarray(N,)) – The integrand \(f(x)\) being integrated defined on \(N\) points. If spherical attribute is true, then the radial component is integrated in spherical coordinates.
- Returns
Returns the integration \(\int f(x) dx\) or if spherical is true then returns \(\int_0^\infty f(r) 4 \pi r^2 dr\).
- Return type
float
-
goodness_of_fit
(self, coeffs, expons)[source]¶ Compute various measures over the grid to determine the accuracy of the fitted model.
In particular, it computes the integral of the model, the \(L_1\) distance, the \(L_\infty\) distance and attribute measure distance between true and model functions.
- Parameters
coeffs (ndarray) – The coefficients of Gaussian basis functions.
expons (ndarray) – The exponents of Gaussian basis functions.
- Returns
integral (float) – Integral of approximate model density, i.e. norm of approximate model density.
l_1 (float) – Integral of absolute difference between density and approximate model density. This is defined to be \(L_(f, g) = \int |f(x) - g(x)| dx\).
l_infinity (float) – The maximum absolute difference between density and approximate model density. This is defined to be \(L_\infty(f, g) = \max |f(x) - g(x)|\).
least_squares (float) – Square of the \(L_2\) norm between density and approximate model density. This is defined to be \(L_2^2(f, g) = \int (f(x) - g(x))^2 dx\).
kullback_leibler (float) – Kullback-Leibler divergence between density and approximate model density. This is defined to be \(KL(f, g) = \int f(x) \log\bigg(\frac{f(x)}{g(x)}\bigg)dx\). If \(g\) is negative, then this returns infinity.