The Theory of Analytical Chemistry is of surprisingly recent date, see:
 K.S. Booksh and B.R. Kowalski
Theory of analytical chemistry
Analytical Chemistry, 66 (1994) 782A791A
In essence, this theory provides a unifying framework for instruments of varying complexity, e.g. NIR, HPLCUV and excitation emission fluorescence. This unification is achieved by classifying the data according to tensor algebra.
This page is organized as follows:
Tensors and the order of data
For our purpose, it is convenient to view a tensor as a mathematical object that can hold data:


Figure TAC 1: Examples of tensors with their associated space(s), common designation, e.g. scalar, and order in brackets. The order of a tensor should not be confused with, among others, the order of the familiar polynomial model based on scalars (zerothorder tensors) or the order of the derivative of spectra (firstorder tensors).

Consider an instrument that yields a single UV absorbance. In other words, the data for a single chemical sample consist of a scalar. A scalar is a zerothorder tensor, hence the UV instrument can be classified as a zerothorder instrument.
A NIR spectrometer yields a spectrum:


Figure TAC 2: NIR spectra of gasolines containing varying levels of the oxygenate ethanol. Ethanol exhibits a broad characteristic absorption band between 7000 and 6000 cm^{1} due to hydrogenbonded OH stretch. The spectral overlap between ethanol, gasoline and the main oxygenate methyltertbutyl ether precludes finding a wavelength sufficiently selective for constructing a univariate model. However, partial selectivity is sufficient to construct a valid multivariate model to determine oxygenate content.

Hence a single index suffices to organize the data and, consequently, the instrument is termed firstorder.
An HPLCUV instrument yields a matrix of data (rows × columns), which can be visualized as a twodimensional surface or 'landscape':


Figure TAC 3: Simulated HPLCUV landscape with overlap in the chromatographic and spectral domain. The partial selectivity of the spectra (right panel of Figure MWC 1) can, in principle, be used to mathematically resolve the chromatographic profiles (left panel of Figure MWC 1). The timeevolutionary character of the data is conveniently vizualised in a score plot, see Figure PCA 2). Note that the possibilities for mathematical 'curve resolution' are very limited for firstorder data: in general, it is not possible to extract, for example, the purecomponent NIR spectra from mixture data.

Now two indices are required for labelling each datum, hence it is a secondorder instrument.
In a loose sense, the order of the instrument is given by the minimum number of indices required to organize the data in a meaningful way. For example, a NIR spectrum can be rearranged ('folded') into a matrix but this representation would be arbitrary because there are no useful relationships among the two indices.
The framework is general in that it allows for any number of indices, although in practice the complexity of most instruments will be secondorder at most.

Order among analytical problems

Top

An attractive feature of this framework is that it reveals a natural progression in analytical problems that can be solved with increasing complexity of the instrument:
 Zerothorder instrument: when measuring a single UV absorbance, one can only reliably determine the analyte content if the signal of the interferences is constant so that it can be removed by a suitable background subtraction. Bias due to a varying interference cannot be detected.
 Firstorder instrument: when measuring a NIR spectrum, the signal of the interferences is allowed to vary in the prediction sample as long as it varies similarly in the samples used to construct the model. This is called the firstorder advantage. Being able to model interferences explains, for example, the popularity of NIR spectroscopy in the food and agricultural industry. If the prediction sample contains unexpected interferences, however, the determination will be biased. Fortunately, this sample can be identified as an outlier because of the unusually large spectral residuals.
 Secondorder instrument: when measuring a matrix of data for a single sample, e.g. through the hyphenation of chromatography and spectroscopy, the secondorder advantage is obtained. Now it is possible to correctly determine an analyte in the presence of interferences that were not accounted for during calibration. In other words, a particularly nasty type of outliers does not exist! It is important to note that the secondorder advantage is already obtained using a single calibration sample. This has farreaching consequences. For example, a selfcalibrating instrument can be developed on the basis of a single standard addition of the analyte.
Since the framework is based on algebra, i.e. pure numbers, the physical nature of the indices does not matter. This fact allows one to rationally select the most promising method for data analysis on the basis of analogies with seemingly unrelated instruments. For example, NIR imaging data obtained for a single sample is thirdorder (1 spectral and 2 spatial indices). Likewise, the data obtained for a calibration set using excitation emission fluorescence (1 sample and 2 spectral indices) or HPLCUV (1 sample, 1 time and 1 spectral index) is thirdorder. Consequently, theory predicts that NIR imaging data can be analyzed using techniques that have been developed for data of the latter kind. This has been nicely demonstrated in:
 F.W. Koehler IV, E. Lee, L.H. Kidder and E.N. Lewis
Near infrared spectroscopy: the practical imaging solution
Spectroscopy Europe, 14 (2002) 1219

Research Topics

Top

The increasing complexity of the data implies certain drawbacks too. It is logical that the statistical methodology is inherently more complex so that the current knowledge of higherorder calibration methods is relatively underdeveloped. Here is certainly room for further research. The following table, adapted from the Booksh & Kowalski paper, summarizes the pros and cons of increasing data complexity:

Table TAC 1: Characteristics of calibration of instruments of increasing complexity.


The theory of analytical chemistry also enables one to generalize already accepted methodology to the higherorder domain. Consider, for example, the error propagation equations presented by Rocco DiFoggio in a lucid feature article:

Their application has lead to significantly improved NIR models by reducing the effect of noise and artifacts. To see how their generalization can be brought about, consider the NIR modelbased determination in further detail. Ignoring a possible intercept, which can be accounted for by meancentring the data, the model equation is:
content = B_{1} × S_{1} + (...) + B_{p} × S_{p}
where B stands for parameter, S for signal and p is the number of wavelengths.
It is seen that the signal at each wavelength is multiplied with a model parameter and these products are subsequently summed. This operation amounts to the scalar product of two vectors, one holding the sample spectrum and the other the model parameters:


Figure TAC 4: Geometrical representation of the modeling of spectral interferences in pdimensional space.

The true model vector is orthogonal to the hyperplane spanned by the spectra of the interferences, otherwise variations in the data due to the interferences will contribute to the determination. Noise in the data will lead to an estimated model vector that (on average) does not exactly decompose the sample spectrum in the part that has predictive value and its orthogonal companion, which is completely overlapped with the interferences' spectra. This leads to prediction bias, as detailed in:
 C.D. Brown
Discordance between net analyte signal theory and practical multivariate calibration
Analytical Chemistry, 76 (2004) 43644373
The error propagation formulas developed by DiFoggio require the model vector to be orthogonal with respect to artifacts such as a varying baseline or a wavelength shift. This requirement can be fulfilled by adding new base vectors to the hyperplane spanned by the spectral interferences and subsequently orthogonalizing the model vector to this extended hyperplane. DiFoggio aptly terms this process as 'desensitizing' the model. For a rigorous treatment, see:
 R. DiFoggio
Desensitizing models using covariance matrix transforms or counterbalanced distortions
Journal of Chemometrics, 19(2005) 203215
The theory of analytical chemistry includes NIR calibration as a special case of firstorder calibration. In higherorder calibration, one has to consider several subspaces, possibly in interaction through nested error structures, see e.g.:
 K.S. Booksh and B.R. Kowalski
Error analysis of the generalized rank annihilation method
Journal of Chemometrics, 8(1994) 4563
Now one has to 'desensitize' the model vector in each relevant subspace.

Conclusions

Top

The Theory of Analytical Chemistry provides a sound basis for identifying developments that are of direct interest to instrument manufacturers, as well as areas where additional research is called for. Figures of merit and other reliability measures may be useful for quantifying the advantages obtained by increasing the complexity of instruments.

References & further information

Top

Open a list of references
For further information, please contact Karl Booksh:

