The characterization of measurement noise enables a more realistic and detailed assessment of the reliability of models and derived results such as predictions.
However, we recognize that the relevance of an advanced reliability assessment is often quite limited in practice. One of the attendants of a 1995 sponsor meeting of the Center of Process Analytical Chemistry (CPAC) expressed his main concern as follows:
I don't want to know how bad my models are, I want better models!
Advanced reliability assessment could be termed as passive use of the noise characteristics of the data. This page intends to illustrate that active use is possible too, and that it may in fact lead to significantly better models.
This page is organized as follows:
Chemometrics vs. statistics
According to the International Chemometrics Society (ICS),
Chemometrics is the science of relating measurements made on a chemical system or process to the state of the system via application of mathematical or statistical methods. (Our bold.)
However, it is widely recognized that chemometricians primarily focus on the signal in the data. By contrast, statisticians are (also) very much concerned about the detailed noise characteristics. This fundamental discrepancy has led to the introduction of many suboptimal modelling techniques in the past. Only recently have chemometricians started to develop multivariate and multiway calibration methods from the idea of exploiting knowledge of measurement noise. The active use of measurement noise can thus be seen as an attempt towards closing the gap between chemometrics and (traditional) statistics. The first work in this direction was concerned with the improvement of conventional principal component analysis (PCA), see:
 P.D. Wentzell, D.T. Andrews, D.C. Hamilton, N.M. Faber and B.R. Kowalski
Maximum likelihood principal component analysis
Journal of Chemometrics, 11 (1997) 339366
Download (=334 kB)
It is of considerable practical significance that one does not require highly accurate noise estimates to obtain superior results using maximum likelihood (ML) methods.

Illustrative example

Top

The following is excerpted from:
 S.K. Schreyer, M. Bidinosti and P.D. Wentzell
Application of maximum likelihood principal components regression to fluorescence spectra
Applied Spectroscopy, 56 (2002) 789796
Download (=873 kB)
Fluorescence spectra were obtained from mixtures of three polyclic aromatic hydrocarbons (PAHs):


Figure MXL 1: Representative fluorescence emission spectra from mixture solutions of acenaphthylene (ace), naphthalene (nap), and phenanthrene (phe).

Five replicate sets of spectra were measured for each of 27 mixtures, prepared according to a threelevel, threefactor factorial design. For each mixture, the covariance matrix of the measurement noise was estimated from the spread in the replicates  within and across channels. Averaging over the 27 mixtures results in a pooled estimate:


Figure MXL 2: Contour (left) and mesh plot (right) representations of the pooled measurement error covariance matrix for the fluorescence emission spectra.

A feature that is prominent in the mesh plot is the positive offset everywhere. This indicates that all channels have a significant positive correlation. Such behavior is anticipated, since this is the characteristic expected with blank noise or related effects, such as cell positioning errors.
The magnitudes of the variance (diagonal) and covariance (offdiagonal) elements are directly related to the corresponding spectral intensities. This is more apparent when plotting the pooled standard deviations at each channel (i.e., the square root of the pooled variances) with the mean spectrum:


Figure MXL 3: Comparison of pooled measurement standard deviations and mean fluorescence emission spectrum.

This type of behavior in the variance (and hence the covariance) is also not unexpected since shot noise exhibits a squareroot dependence on the signal intensity and flicker noise is directly proportional to the signal intensity.
The fact that heteroscedasticity and correlated noise are both substantial contributors to the error structure is a fortunate outcome of the experiment, since it allows the impact of both features to be examined.
Four calibration methods were evaluated for these data: principal components regression (PCR), partial least squares (PLS) regression, maximum likelihood principal components regression (MLPCR) using only the diagonal of the error covariance matrix (MLPCRdiag) and MLPCR using the full error covariance matrix:

PCR and PLS give virtually identical results for all three analytes. This indicates that PLS, despite its often reported advantages over PCR, does not have any special ability to handle heteroscedastic and correlated measurement errors.
It is clear that MLPCR using the pooled error covariance matrix gives an improvement of about a factor of two in the prediction error over PCR and PLS. Interestingly, the use of MLPCR with only the diagonal of the covariance matrix to account for heteroscedasticity without correlation does not produce any improvement over PCR or PLS.
A factor of two improvement is similar to that reported for visible absorbance spectroscopy in:
 P.D. Wentzell and M.T. Lohnes
Maximum likelihood principal component analysis with correlated measurement errors: theoretical and practical considerations
Chemometrics and Intelligent Laboratory Systems, 45 (1999) 6585
Download (=410 kB)
Improvements by as much as a factor of four were described for a nearinfrared reflectance application in:
 C.D. Brown, L. VegaMontoto and P.D. Wentzell
Derivative preprocessing and optimal corrections for baseline drift in multivariate calibration
Applied Spectroscopy, 54 (2000) 10551068
Download (=585 kB)
Finally, successful process applications of maximum likelihood methods are described in:
 M.S. Reis and P.M. Saraiva
Integration of data uncertainty in linear regression and process optimization
AIChE Journal, 51 (2005) 30073019
 M.S. Reis and P.M. Saraiva
Heteroscedastic latent variable modelling with applications to multivariate statistical process control
Chemometrics and Intelligent Laboratory Systems, 80 (2006) 5766

References & further information

Top

