logo_2Cy.gif
Home About us Media Research Consultancy Training Site map Contact

Home » Research » Theory of Analytical Chemistry

The Theory of Analytical Chemistry is of surprisingly recent date, see:

  • K.S. Booksh and B.R. Kowalski
    Theory of analytical chemistry
    Analytical Chemistry, 66 (1994) 782A-791A


In essence, this theory provides a unifying framework for instruments of varying complexity, e.g. NIR, HPLC-UV and excitation emission fluorescence. This unification is achieved by classifying the data according to tensor algebra.

This page is organized as follows:


Tensors and the order of data

For our purpose, it is convenient to view a tensor as a mathematical object that can hold data:

TAC1.gif

Figure TAC 1: Examples of tensors with their associated space(s), common designation, e.g. scalar, and order in brackets. The order of a tensor should not be confused with, among others, the order of the familiar polynomial model based on scalars (zeroth-order tensors) or the order of the derivative of spectra (first-order tensors).


Consider an instrument that yields a single UV absorbance. In other words, the data for a single chemical sample consist of a scalar. A scalar is a zeroth-order tensor, hence the UV instrument can be classified as a zeroth-order instrument.

A NIR spectrometer yields a spectrum:

TAC2.gif

Figure TAC 2: NIR spectra of gasolines containing varying levels of the oxygenate ethanol. Ethanol exhibits a broad characteristic absorption band between 7000 and 6000 cm-1 due to hydrogen-bonded O-H stretch. The spectral overlap between ethanol, gasoline and the main oxygenate methyl-tert-butyl ether precludes finding a wavelength sufficiently selective for constructing a univariate model. However, partial selectivity is sufficient to construct a valid multivariate model to determine oxygenate content.


Hence a single index suffices to organize the data and, consequently, the instrument is termed first-order.

An HPLC-UV instrument yields a matrix of data (rows × columns), which can be visualized as a two-dimensional surface or 'landscape':

TAC3.gif

Figure TAC 3: Simulated HPLC-UV landscape with overlap in the chromatographic and spectral domain. The partial selectivity of the spectra (right panel of Figure MWC 1) can, in principle, be used to mathematically resolve the chromatographic profiles (left panel of Figure MWC 1). The time-evolutionary character of the data is conveniently vizualised in a score plot, see Figure PCA 2). Note that the possibilities for mathematical 'curve resolution' are very limited for first-order data: in general, it is not possible to extract, for example, the pure-component NIR spectra from mixture data.


Now two indices are required for labelling each datum, hence it is a second-order instrument.

In a loose sense, the order of the instrument is given by the minimum number of indices required to organize the data in a meaningful way. For example, a NIR spectrum can be rearranged ('folded') into a matrix but this representation would be arbitrary because there are no useful relationships among the two indices.

The framework is general in that it allows for any number of indices, although in practice the complexity of most instruments will be second-order at most.


Order among analytical problems


Top Top blue.gif

An attractive feature of this framework is that it reveals a natural progression in analytical problems that can be solved with increasing complexity of the instrument:

  • Zeroth-order instrument: when measuring a single UV absorbance, one can only reliably determine the analyte content if the signal of the interferences is constant so that it can be removed by a suitable background subtraction. Bias due to a varying interference cannot be detected.
  • First-order instrument: when measuring a NIR spectrum, the signal of the interferences is allowed to vary in the prediction sample as long as it varies similarly in the samples used to construct the model. This is called the first-order advantage. Being able to model interferences explains, for example, the popularity of NIR spectroscopy in the food and agricultural industry. If the prediction sample contains unexpected interferences, however, the determination will be biased. Fortunately, this sample can be identified as an outlier because of the unusually large spectral residuals.
  • Second-order instrument: when measuring a matrix of data for a single sample, e.g. through the hyphenation of chromatography and spectroscopy, the second-order advantage is obtained. Now it is possible to correctly determine an analyte in the presence of interferences that were not accounted for during calibration. In other words, a particularly nasty type of outliers does not exist! It is important to note that the second-order advantage is already obtained using a single calibration sample. This has far-reaching consequences. For example, a self-calibrating instrument can be developed on the basis of a single standard addition of the analyte.


Since the framework is based on algebra, i.e. pure numbers, the physical nature of the indices does not matter. This fact allows one to rationally select the most promising method for data analysis on the basis of analogies with seemingly unrelated instruments. For example, NIR imaging data obtained for a single sample is third-order (1 spectral and 2 spatial indices). Likewise, the data obtained for a calibration set using excitation emission fluorescence (1 sample and 2 spectral indices) or HPLC-UV (1 sample, 1 time and 1 spectral index) is third-order. Consequently, theory predicts that NIR imaging data can be analyzed using techniques that have been developed for data of the latter kind. This has been nicely demonstrated in:

  • F.W. Koehler IV, E. Lee, L.H. Kidder and E.N. Lewis
    Near infrared spectroscopy: the practical imaging solution
    Spectroscopy Europe, 14 (2002) 12-19


Research Topics


Top Top blue.gif

The increasing complexity of the data implies certain drawbacks too. It is logical that the statistical methodology is inherently more complex so that the current knowledge of higher-order calibration methods is relatively underdeveloped. Here is certainly room for further research. The following table, adapted from the Booksh & Kowalski paper, summarizes the pros and cons of increasing data complexity:

Table TAC 1: Characteristics of calibration of instruments of increasing complexity.

TAC4.gif


The theory of analytical chemistry also enables one to generalize already accepted methodology to the higher-order domain. Consider, for example, the error propagation equations presented by Rocco DiFoggio in a lucid feature article:

TAC5.gif


Their application has lead to significantly improved NIR models by reducing the effect of noise and artifacts. To see how their generalization can be brought about, consider the NIR model-based determination in further detail. Ignoring a possible intercept, which can be accounted for by mean-centring the data, the model equation is:

content = B1 × S1 + (...) + Bp × Sp


where B stands for parameter, S for signal and p is the number of wavelengths.

It is seen that the signal at each wavelength is multiplied with a model parameter and these products are subsequently summed. This operation amounts to the scalar product of two vectors, one holding the sample spectrum and the other the model parameters:

TAC6.gif

Figure TAC 4: Geometrical representation of the modeling of spectral interferences in p-dimensional space.


The true model vector is orthogonal to the hyperplane spanned by the spectra of the interferences, otherwise variations in the data due to the interferences will contribute to the determination. Noise in the data will lead to an estimated model vector that (on average) does not exactly decompose the sample spectrum in the part that has predictive value and its orthogonal companion, which is completely overlapped with the interferences' spectra. This leads to prediction bias, as detailed in:

  • C.D. Brown
    Discordance between net analyte signal theory and practical multivariate calibration
    Analytical Chemistry, 76 (2004) 4364-4373


The error propagation formulas developed by DiFoggio require the model vector to be orthogonal with respect to artifacts such as a varying baseline or a wavelength shift. This requirement can be fulfilled by adding new base vectors to the hyperplane spanned by the spectral interferences and subsequently orthogonalizing the model vector to this extended hyperplane. DiFoggio aptly terms this process as 'desensitizing' the model. For a rigorous treatment, see:

  • R. DiFoggio
    Desensitizing models using covariance matrix transforms or counter-balanced distortions
    Journal of Chemometrics, 19(2005) 203-215


The theory of analytical chemistry includes NIR calibration as a special case of first-order calibration. In higher-order calibration, one has to consider several subspaces, possibly in interaction through nested error structures, see e.g.:

  • K.S. Booksh and B.R. Kowalski
    Error analysis of the generalized rank annihilation method
    Journal of Chemometrics, 8(1994) 45-63


Now one has to 'desensitize' the model vector in each relevant subspace.


Conclusions


Top Top blue.gif

The Theory of Analytical Chemistry provides a sound basis for identifying developments that are of direct interest to instrument manufacturers, as well as areas where additional research is called for. Figures of merit and other reliability measures may be useful for quantifying the advantages obtained by increasing the complexity of instruments.


References & further information


Top Top blue.gif

Open blue.gif Open a list of references

For further information, please contact Karl Booksh: Karl Booksh.jpg