Calibration uncertainty in pharmaceutical analysis 

Replicate single-point calibration revisited


Elsewhere on this site (Modernity) I remark that the Chemistry, Manufacturing and Control (CMC) branch of pharmaceutical analysis seems to be disconnected in some respects from the wider world of analytical chemistry.

We discuss one aspect of the disconnect in a note that can be downloaded below: textbooks and guidelines on statistics for analytical chemistry don't cover every aspect of everyday standard practice with pharmaceutical assays. Such assays are of course critical for the safety and efficacy of medicines.

For practical and economic reasons, assay methods are designed whenever possible so that they can be relied upon to give an instrumental response that is strictly proportional to sample amount, without this having to be verified each time a method is applied. Once it has been validated, such a method can be used with single-point calibration, the statistical aspects of which are well covered, for example by the Eurachem guide. Basically, we make a measurement on an accurately known mass of the reference standard, and calculate a response factor that is applied to measurements on the substance being assayed. Sometimes (titrimetry), standards and samples are analysed directly; often (spectrophotometry, chromatography), they have first to be made up into solutions of accurately known concentration. If the standards are intercalated with the unknowns we can detect and correct for drift, an advantage that doesn't easily apply to the calibration curve approach.

There is nothing difficult about this, apart from a "pharmaceutical exception", as one would say in France: for regulatory analyses, measurements are nearly always done in duplicate at least. On this point even the Eurachem guide chickens out.

What's the problem with replicate standards?

There would be no difficulty if the duplicates (or replicates) could be made of identical amounts, but this is rarely achievable with current official practice. Using a spatula and a weighing vessel, it isn't usually possible to weigh out exactly the specified amount of a powder. As a result, the statistical treatment of what we would like to think of as a duplicate single point calibration amounts to curve fitting, which is usually done by least squares regression analysis. Whether or not we are aware of it, all of the formulae likely to be used for replicate calibrations do in fact amount to regression analyses with different more or less appropriate weightings.

With regression analysis, to get the statistically most likely result in the presence of random error, we need to obtain information or make assumptions about the way measurement error depends on the sample amount. Very often, there isn't enough information to make this decision. While this point is sometimes considered in the literature (usually when discussing calibration curves), any resulting uncertainties in the analytical result are usually considered to be negligible. We will show that this conclusion is wrong for typical pharmaceutical assays.  Furthermore, analysts may be led, unknowingly, by a  laboratory data system to make the least appropriate assumption in this regard. The result of a duplicate calibration calculated using the wrong algorithm can deviate from the most likely value by 0.3 percent, which is a large chunk of the allowable uncertainty for a method.

An approach consistent with current practice would to use a calibration algorithm that draws a compromise half way between zero and total dependence of random error on amount. In cases where solutions of accurately known concentration have to be made, the  uncertainties can be practically eliminated, simply by weighing in the solvent. With a modern balance, it's fairly easy (subject to safety precautions) to dispense a weight of liquid that gives nearly the specified concentration.

Unfortunately, in such situations, the pharmacopeial monograph specifies (implicitly or not) the use of  volumetric flasks, which have a limited selection of capacities. Recently though, the USP (revised chapter <841>) has discovered that the volume of a quantity of liquid is proportional to its mass, so henceforth we will be allowed to prepare solutions gravimetrically.  The USP doesn't seem to have noticed all the ways by which, if suitably implemented, this innovation could provide scope for tightening up analytical procedures and reducing analytical uncertainty.

Oddly enough, gravimetric sample preparation has already been normal pharmaceutical practice for more than a decade with robotic sample preparation stations, such as those introduced by Zymark. Recently, Mettler Toledo have introduced a balance with dispensers for powders and liquids that should be more accurate than an ordinary balance because it doesn't need to be opened between operations.

While this story provides another occasion to lampoon the pharmacopeial authorities, it also invites reflection on why practice was not adapted as soon as electronic balances were introduced more than three decades ago. Some things don't seem to have evolved much since Lavoisier lost his head.

Other difficulties with replicate standards

We mention in the download another source of uncertainty due to the inability to accurately adjust sample amounts. It is related to the practice of considering a slightly non-linear response to be linear if the deviation from linearity is within specified limits.

Analytical run sequences are (or ought to be) designed to detect and possibly compensate response drift by intercalating standards and unknowns. Designing suitable sequences is not always straightforward with multiple standards.

Christopher R Lee

  Version 2 March 2013

CalibScedastPost2.pdf (211 kB)

Practical consequences for pharmaceutical analysts

To summarise the purpose of this page, we describe a source of analytical uncertainty that seems to have been overlooked. It is not fully covered by pharmacopeial prescriptions or recognised guidelines. The problem is that existing tolerances for sample amounts and concentrations do not ensure that uncertainties due to random measurement errors are always within acceptable limits. Possible errors arising from non-linear responses may also need to be re-evaluated within this context.

In this kind of situation and in the absence of official guidance, the rules of the quality system known as Good [pharmaceutical] Manufacturing Practice (GMP) presumably require that operating procedures be adapted  in accordance with a scientific rationale that has to be written. Drafting a suitable rationale should not be too difficult, although members of Quality Assurance and the  inspectorate must be able to follow the basic arguments, or at least think they do. Also, one would not wish to give any impression that previous practice was, perhaps, ever so slightly not quite right.

Procedural changes could be introduced elegantly and relatively painlessly by including them in recent initiatives on tolerances for weighing errors, and on the making up of solutions by weighing.

There seem to be three practical approaches that could be adopted, which are not mutually exclusive:
  1. Require a justification for the calibration algorithm to be applied, based on knowledge of the statistical properties of the analytical response. Revise tolerances for sample amounts accordingly. In practice, it may be sufficient to use the compromise algorithm of Equation 5 (see download), together with only a minor revision of current tolerances for sample amounts.
  2. Where samples have to be made up to an accurately known volume, weigh in the solvent so as to adjust this volume to match the sample weight.
  3. Bearing in mind that the manual weighing of some kinds of powders is often a source of difficulty, consider the widespread adoption of recently-introduced automatic equipment, if this is found in the long term to approach specified target weights more closely.

Is method validation the best approach?

Assay methods for pharmaceuticals and fine chemicals are validated, so that in principle only a single-point calibration and some basic performance checks are required each time a method is applied. Validation is a long and expensive procedure, the purpose of which is to ensure that a method is "under control". In other fields you might expect validation to provide also a means of estimating the likely uncertainty of each result that is reported. That aspect isn't much discussed in pharmaceutical guidelines, and indeed the statistical link between validation and routine practice may not be as strong as we would like.

Such a link must be established if validation results are to be considered predictive of future performance.  In practice, this kind of statistical inference can only be done reliably if measurement errors are normally distributed. Here, we tend to rely on textbook generalisations, because collecting enough repeatability data (perhaps for 100 measurements) is not practical, and we have no evidence that distributions don't vary between days (for example if a powder becomes difficult to weigh on a dry day). Depending on instrumental details and other conditions, we should not be surprised if the unit operations of weighing and liquid chromatographic sample injection were to show skewed distributions; I found no literature on this question.

The presentation on this page takes the argument further: we have shown that lack of knowledge of the dependence of measurement error variance on amount (heteroscedasticity) can lead to contributions to the uncertainty budget that are easily revealed by discrepancies between results obtained using different calibration algorithms.  If we can't be sure about error dispersions, we can't be sure at all about heteroscedasticity.

It's possible, however, to improve on current practice with regard to heteroscedasticity by adopting a compromise algorithm, as discussed for survey statistics by Knaub and others (see references in the article for downloading), and by narrowing the allowable working range.

Ultimately, though, this could be an occasion to propose that we abandon age-old analytical techniques and adopt ones that don't require the validation of a working range. Time and money could be saved by doing without this aspect of  validation. A key to such a change seems to be the introduction of automated and adaptive weighing of powders and liquids.

If we can assume that the new weighing equipment lives up to its expectations, it is to be hoped that users will be able to make a break with tradition and publish enough "real" (in-use) data to enable the statistical properties of the usual operations to be characterised in a way that has not been done in the past. This would not be too easy for weighing errors, because in the most interesting situations the only way to evaluate them is to carry out the complete assay procedure.

"I'm not very good at statistics"

The subject of this page may seem fairly obscure, though there's nothing that goes beyond the scope of guides and textbooks intended for scientists who are not statisticians. Surprisingly, as mentioned in one of the references to the article, even the United States NIST gets it wrong regarding the dependence of random error on amount.

Colleagues in pharmaceutical analytical development picked up the difficulty being discussed while installing a new chromatographic data acquisition and processing system; different formulae for calculating response factors gave slightly different results with real data. It took a little while to trace the source of the discrepancies.

Since people working in this field don't change jobs very often, it's difficult to know whether our level of knowledge of statistics was typical of this (CMC) branch of the pharmaceutical industry, or of practicing analysts in general. Going back 50 years, a university course on statistics for scientists was likely to start with a rigorous  derivation of the equation for the normal distribution1. Whether or not we students could do the advanced calculus, the knowledge wasn't too helpful in the analytical lab and would soon be forgotten.

In practice, individual scientists in industrial pharmaceutical development may not often be called upon to use their statistical skills. Procedures and algorithms are developed by a specialised statistics department, or by external organisms, as typified by the pharmacopeial procedure for determining uniformity of content. Analytical validation is heavily dependent on what I term somewhat facetiously "plug & play" protocols.  These tend to place more emphasis on the statistical properties of the analytical signal than on the confidence limits of the analytical result. There seems to be little use, in this field, for elaborate "bottom-up" calculations of confidence limits as described in the Eurachem guides2. Perhaps, as is sometimes discussed, there isn't too much confidence in the value of such methods, in particular with respect to assumptions about error distributions.

Recent textbooks may skip most of the calculus, but launch straight into the basic laws and equations without very much information on where they came from. Practically-minded scientists may, like Einstein, feel a need for something they can visualise. Measurement errors can be considered as the sum of a large number of elementary or microscopic errors, each of which has an equal probability of occurring. This approach has been frowned upon as unrealistic3, although it could work in analytical chemistry if large errors are thought of as the sums of numerous microscopic ones. It has the advantage of providing a calculus-free explanation of why this is a world of squares and square roots.  Also, it becomes clear that the basic theory is, as it stands, fairly useless in our field: measurement errors are rarely additive; they nearly always have a multiplicative component. Given this information, the reader (though not the NIST) will understand that a sort of kludge called weighting is fundamental to a statistical analysis of analytical data, and is not an optional add-on. Although I've called it a kludge, people who understand the subject have established that appropriate weighting does give the statistically most likely result, which is what the analyst should be aiming for. More pragmatically, when we can't know what the most appropriate weighting would be, we need to know what algorithm provides the least unsuitable compromise.

To this end, I've been trying to outline a document that could form Chapter Zero of a book entitled something like 'Statistics for complete idiots'. I may get round to tidying some of it up for posting, if anyone might be interested.


1. Yardley Beers, Introduction to the theory of error. Addison-Wesley, Second Edition, 1962.

2. Eurachem (1998). The Fitness for Purpose of Analytical Methods A Laboratory Guide to Method Validation and Related Topics. Eurachem/Citac,

Eurachem (2000). Quantifying uncertainty in analytical measurement. Second edition.

3. David J. Hand, Statistics, a very short introduction, Oxford 2008, p. 59.

Copyright © 2013. Tous droits réservés.30 April 2013 Contact: