Goals and use

Expert elicitation in the context of uncertainty quantification aims at a credible and traceable account of specifying probabilistic information regarding uncertainty, in a structured and documented way. Typically it is applied in situations where there is scarce or insufficient empirical material for a direct quantification of uncertainty, and where it is relevant to obtain scrutable and defendable results (Hora, 1992).

Several elicitation protocols have been developed amongst which the much-used Stanford/SRI Protocol is the first (Spetzler and von Holstein, 1975; see also Merkhofer, 1987; Morgan and Henrion,1990; chapter 6 and 7). Related expert elicitation protocols have been employed by Sandia National Laboratories for uncertainty quantification in the nuclear energy risk assessment field (Hora and Iman, 1989; Keeney and von Winterfeldt, 1991; Ortiz et al. 1991; Hora, 1992; NCRP, 1996). As an outcome of a joint project of the European union and the US Nuclear Regulatory Commission, Cooke and Goossens (2000a,b) have developed a European guide for expert judgement on uncertainties of accident consequence models for nuclear power plants.

In the sequel we will discuss two specific elicitation protocols, briefly commenting on the steps involved.

[A] The first protocol is based for a large part on the Stanford/SRI protocol, but additionally provides an explicit assessment of the quality of the uncertainty information on basis of a pedigree analysis (see Risbey et al., 2001; van der Sluijs et al. 2002, and the NUSAP entry in this tool-catalogue). The following steps are involved:

Identifying and selecting experts

It is important to assemble an expert panel representing all points of view.

Motivating the Subject

Establish rapport with the expert. Explain to the expert the nature of the problem at hand and the analysis being conducted. Give the expert insight on how their judgements will be used. Discuss the methodology and explain the further structure of the elicitation procedure. Discuss the issue of motivational biases and try to let the expert make explicit any motivational bias that may distort his judgement.


he objective is to arrive at a clear and unambiguous definition of the quantity to be assessed. Choose a unit and scale that is familiar to the expert in order to characterize the selected variable. Underlying conditions and assumptions that the expert is making should be clearly identified.

Elicit extremes

Let the expert state the extreme minimum and maximum conceivable values for the variable.

Extreme assessment

Ask the expert to try to envision ways or situations in which the extremes might be broader than he stated. Ask the expert to describe such a situation if he can think of one, and allow revision of the extreme values accordingly in that event.

Assessment of knowledge level and selection of distribution

Before letting the expert specify more detailed information about the distribution it is important that this is done in a way that is consistent with the level of knowledge about the variable. In particular, we seek to avoid specifying more about the distribution shape than is actually known.
Risbey et al. (2001) have proposed a heuristic for this, making use of aggregated normalized pedigree scores (see NUSAP entry in this tool-catalogue) to guide selection of distribution shape: If the aggregated normalized pedigree grade for the parameter is less than 0.3, use a uniform distribution. If it is between 0.3 and 0.7, use a triangular distribution. If it is greater than 0.7, use a normal distribution or other distributions as appropriate.

Specification of distribution

If the expert selected a uniform distribution you do not need to elicit any further values. If the expert selected a triangular distribution, let him estimate the mode. If he chooses another shape for the distribution (e.g. normal), you have to elicit either parameters (e.g. mean and standard deviation for normal distribution) or values of -for instance - the 5th, 50th, and 95th percentiles. Let the expert briefly justify his choice of distribution.


Verify the probability distribution constructed against the expert's beliefs, to make sure that the distribution correctly represents those beliefs.

Aggregating expert distributions

In case that multiple experts have assessed PDFs, there is no single best way to combine their findings. It is recommended to run the Monte Carlo simulations of the model under study separately for each expert's uncertainty specification, and to compare their differences. If differences between experts are large, one should analyse where and why this happens. A conservative choice could be to select the broadest PDF from among the different experts, and use that, unless there are good reasons not to do so. In communicating the results one should explicitly address that there is expert disagreement, and mention that the choice of distribution is somehow indicative of the upper range of the spread amongst the disparate experts.

[B] The second protocol that we present is the one by Cooke and Goossens (2000a,b) which was further adapted by Van der Fels-Klerx et al (2002) for use in heterogeneous expert panels on broad or multidisciplinary issues. Major ground-rule in Cooke and Goossens set-up is that the experts should in principle only be questioned about (potentially) observable quantities within the area of their expertise[2]. Moreover the protocol aims to explicitly assess the expert's performance by letting the expert elicit so called 'performance' or 'seed' variables, the values of which are unknown to the expert, but known to the analyst. Furthermore performance based weights can be determined to aggregate the assessed PDFs of the individual experts into a combined assessment, which is supposed to reflect a kind of rational consensus on the PDF at hand. The various steps of Cooke and Goossens (2000a,b), protocol are as follows:

Preparation for elicitation:

  1. Definition of the 'case structure' document which clearly describes the field of interest for which expert judgements will be required; the document moreover discusses the aim of the elicitation, and provides background information on applied assumptions and on which issues are taken into account in the uncertainty assessment and which issues are excluded.
  2. Identification of the variables of interest or 'target' variables for the uncertainty elicitation. Typically a certain pre-selection has to take place, to focus on the most important ones for expert elicitation, since the number of questions to be asked by the experts is limited.
  3. Identification of the 'query' or 'elicitation' variables: Target variables, which can in principle be measured by a procedure with which experts are familiar, can directly serve as query variables in an elicitation session. However, target variables for which no such measurement procedures exist cannot be quantified by direct elicitation, and for these variables other derived elicitation query variables (e.g. proxy's) must be found which - ideally - should be (potentially) observable quantities. Information on the uncertainty in these derived elicitation variables must then be translated back into the target variables (see step (14)).
  4. Identification of performance variables (or seed variables): These variables serve as a means to assess the expert's performance in rendering uncertainty information on the target variables. There must be experimental evidence on the seed variables, which is unknown to the experts, but known to the analyst, against which the expert's performance can be gauged somehow. Preferably the seed parameters are so-called 'domain variables', referring directly to the target variables. When this is not feasible, 'adjacent variables' may be used.
  5. Identification of experts: In this step an (as large as possible) list of names of 'domain' experts is collected
  6. Selection of experts: In general, the largest possible number of experts should be used, but at least four. Selection of experts may take place on basis of selection criteria (e.g. reputation in field of interest, experimental experience; diversity in background, balance of views etc.)
  7. Definition of an elicitation format document, which should contain clear questions, explanations, and remarks on what is to be included or excluded in the uncertainty assessments, as well as the specific format in which the assessments should be provided by the experts. The elicitation principally focuses on variables, which are (at least in theoretical sense) measurable. The other target variables parameters are deduced by probabilistic inverse modelling principle (see point (3) and (14).
  8. Dry run exercise: Performing a try out of the elicitation serves to find out where ambiguities and flaws need to be repaired, and whether all relevant information and questions are provided.
  9. Training experts for the quantitative assessment task: Typically experts are asked to provide their subjective PDFs in terms of quantiles of the cumulative distribution, for instance, 5%, 50% and 95% percentiles. They need to be trained in providing subjective assessments in probabilistic terms, and in understanding subjective probability related issues.


  1. Expert elicitation session, where the experts are questioned individually by an analyst[3] to assess the PDF of the query variables (including the seed variables), referring to his field of expertise. As an aid in this process Van der Fels-Klerx et al. (2002) recommend the use of the interactive software package ELI[4] (van Lenthe, 1993), which makes the process of eliciting continuous PDFs easier and less prone to errors and biases.
    In addition to the individual expert interviews, there will in some cases also be joint expert meetings, e.g. to discuss general starting points, or in an intermediate stage as a qualitative wrap-up reviewing of rationales behind the assessments, which can then be used as a shared information base for the next iteration in the individual expert elicitation.


  1. Analysis of expert data, e.g. aggregating the individual experts assessment in one combined probability density function for each of the query variables, e.g. by weighing the experts according to their expertise as measured e.g. by the performance on the seed variables. Software for performing this task is Excalibur (
  2. Robustness and discrepancy analysis, e.g. by removing experts or seed variables from the data set one at a time, and recalculating the combined PDF, comparing it with the original one, which uses all information. Discrepancy analysis identifies items on which the uncertainty assessments of the experts differ most. These items should be reviewed to ascertain any avoidable causes of discrepancy.
  3. Feed back communication with the experts: In general results are treated anonymously, and each expert should have access to his/her assessment and performance weights and scores.
  4. Post-processing analyses (e.g. inverse probability mapping) using the methods for processing uncertainties of the combined expert assessments (see step (11)) of query variables (defined in step 3) into uncertainties on the target variables from step 2. See e.g. Cooke and Kraan (2000).
  5. Documentation of the results: All relevant information and data, including the rationales of the elicitations should be documented in formal reports, to be presented to the experts and to the decision makers.

The above-presented methods differ in a number of respects:

  1. In method [A] the qualification of the elicited uncertainty information has an explicit place and is done on basis of a pedigree analysis, which invites the expert to explicitly indicate the quality and underpinnings of his uncertainty statements. In method [B] this qualification is done on a more empirical basis, by measuring the performance scoring of the expert on basis of seed variables. If seed variables are not available, then in fact no explicit or systematic qualification of uncertainty information is undertaken. The best one can hope for is that the expert's elicitation rationale offers suitable information on the underpinnings of the uncertainty specifications, but this is not explicitly commanded in the protocol.
    Finally, we must realize that in both cases, [A] as well as [B], one is confronted with the problem how 'valid' the established qualifications/scoring are. In [A], since the pedigree scoring is partly done on basis of subjective judgement, and in [B] since one can rightfully ask to what extent the performance scorings on the seed variables are representative for the measuring the performance on all the other target variables. Moreover the quality of the empirical information on the seed variables - which is ideally only known to the analysts - can also be a problematic factor in this context.
  2. The second major difference is that method [B] is stricter on the choice of elicitation variables: only those variables are explicitly elicited for which there is (in principle) empirical evidence with which the expert is acquainted (query variables). Information on other target variables is deduced indirectly from the elicited information on the query variables by applying formal mathematical techniques as e.g. probabilistic inversion.
    Method [A] is less strict: an expert can e.g. be elicited directly on variables, which have no or very low empirical support (i.e. having a low score on the empirical or validation pedigree). Needless to say this can make the elicited PDF rather arbitrary or badly testable, unless there is a good proxy, which can serve as a suitable benchmark. It is therefore important to ask the expert explicitly to indicate how he makes his inference on the PDF; the reasoning involved will typically be more heuristic and less traceable than in the use of probabilistic inversion.
  3. Thirdly, there is an apparent difference in the specification of the PDFs in both methods. [A] typically uses PDFs of specific and familiar form, while [B] primarily does not require an explicit distribution shape. It focuses instead on specifying a number of quantiles, e.g. the 5, 50 and 95 quantiles (see for instance van Oorschot et al. 2003 where additionally the 25 and 75 quantiles are elicited; note that van der Fels-Klerx, 2002 propose to use ELI in the elicitation process which applies Beta-distributions) and then uses information theoretic arguments to further process this information. As such [B] only seems to use limited distributional information, and further supplies it by using information theoretic principles/assumptions. Potentially available information on the specific form of a distribution is not taken fully into account.
  4. Finally, the treatment of multiple experts in method [A] is more heuristic and less formalized than under [B], where an explicit weighted averaging is applied on basis of the seed-variable.

It is difficult to say beforehand which method, [A] or [B], would be preferable, since both have their strengths and weaknesses. In practice we would recommend a judicious mix of both methods, depending on the availability and quality of data and information, and comparing the pros and cons mentioned in the foregoing.

Moreover in setting up a specific elicitation protocol for a particular case there will be additional points of attention to be dealt with. See the 'guidance on application'-entry listed below for a more comprehensive overview.

[2]Therefore they e.g. prefer to elicit/question on concentrations rather than on transfer function coefficients in compartmental models; the uncertainty information can then be translated back into uncertainty information on the coefficients (e.g. by probabilistic inversion cf. Cooke and Kraan, 2000; Bedford and Cooke, 2001).
Reason for this rather 'empirical stance' concerning questioning on (potentially) observable quantities, is that Cooke and Goossens view uncertainty - from a scientific and engineering viewpoint - essentially as 'that which is removed by observation'. Moreover they put forward that not all experts may subscribe to the particular model choices that have been made and that parameters may not necessarily correspond to the measurement material with which they are familiar. A further argument for their stance is to be found in the fact that direct specification/elicitation of correlations between variables in abstract spaces can be rather problematic and arbitrary.

[3] In complex situations two analysts will be recommended, a normative analyst (experienced in probability issues) and a substantive analyst (experienced in the expert's field of interest)

[4] Other examples of elicitation software are PROBES and HYPO, described in Lau and Leong (1999) and Li et al. (2002). Apparently these packages focus on the elicitation process for Bayesian networks.