Toolcatalogus

1.	Sensitivity analysis

2.	Error propagation equation ("TIER 1")

3.	Monte Carlo analysis ("TIER 2")

5.	Expert elicitation for uncertainty quantification

6.	Scenario analysis

7.	PRIMA: A framework for perspective-based uncertainty management

8.	Checklist for model quality assistance

9.	A method for critical review of assumptions in model-based assessments

Guidance on application

A good primer on expert elicitation is Frey, 1998, which is available from: http://courses.ncsu.edu/classes/ce456001/www/Background1.html . See also Baecher, 2002 available from http:/www.glue.umd.edu/~gbaecher/papers.d/judge_prob.d/judge_prob.html.

For books dedicated to expert elicitation see e.g. Meyer and Booker, 1991 and Ayyub, 2001. Cooke, 1991 addresses wider methodological and theoretical issues concerning expert judgement in uncertainty.

Below we discuss a number of extra points, next to the ones mentioned under the heading 'Goals and Use' which deserve special attention when setting up and performing an elicitation:

1. Preliminary screening

The amount of work can be reduced by performing some preliminary screening to select those variables whose uncertainty will affect the outcomes of interest most. Expert elicitation can then focus first on these variables, whilst the other variables are assessed e.g. in a less thorough way.

2. Importance of providing rationale on choices, and assessment of quality (checks and balances)

The uncertainty assessments, as well as the underlying motivations, assumptions and information (measurements, models, reasoning, literature references etc.) that have been used to provide them should be well documented. Some discussion on the backing and quality of this material is also important, as well as an indication of which uncertainty aspects have been left aside. Applying a systematic analysis like e.g. pedigree analysis can be helpful for this purpose.

3. Variability vs. lack-of-knowledge related uncertainty

Uncertainty can partly be seen as an intrinsic property of the system (variability and diversity; but also sampling error), and partly as property of the analyst and knowledge base (e.g. lack of good-quality data, lack of expert knowledge; lack of consensus; controversy [5]).

Though there is often a certain degree of arbitrariness [6] in distinguishing between this variability-induced uncertainty ('aleatory') and lack-of-knowledge induced uncertainty ('epistemic') [7], depending on e.g. modelling purpose, choice of analysis and aggregation level, available information, tradition, (see e.g. Baecher and Christian, 2001), we think it is important to treat this distinction [8] explicitly and with care when eliciting on uncertainty. Hora, 1996, illustrates that a careful decomposition and treatment of aleatory and epistemic uncertainty, centered on the notion of conditional probabilities, is essential for the expert elicitation process and can highly influence the further processing and assessment of uncertainties. See also the recent study of van Oorschot et al., 2003, which underlines these findings.

4. Heuristics and biases

In providing information by expert elicitation one has to cope with judgemental 'heuristic' (mental rules of thumb) and the 'biases', which they produce in the expert judgement (see Kahneman et al. 1982, Griffin et al. 2002). This relates to biases due to cognitive processes as well as to motivational, social and cultural biases (see the subsequent entry on pitfalls). One can try to diminish these biases by training the expert in elicitation and its pitfalls and by setting up the elicitation process adequately (using individual and/or controlled group settings, applying e.g. Delphi technique; nominal group (Benarie, 1988, Cooke, 1991, Ayyub, 2001)) and formulating the elicitation questions in a judicious and unambiguous way (e.g. by applying counterfactual reasoning; asking first to specify the extremes). However, debiasing expert judgements before and during elicitation will stay a difficult goal. Ex post calibration can have some value, but it requires a reasonable amount of data, and moreover the quality and representativeness/covering should be adequate (e.g. Clemen and Lichtendahl (2002) present a Bayesian based calibration method to debias expert overconfidence, enabling also interrelationships between the inference of experts).

5. Selection of distributional form

When data is scarce, a test on goodness of fit will usually give no distinctive outcomes concerning the distribution types (see Cullen and Frey,1999, Morgan and Henrion 1990, Seiler and Alvarez 1996).

For situations with few data a more robust approach is to select distributions in such way that the uncertainty associated with the available information is maximized, i.e. not imposing extra assumptions that are not warranted by data (minimal information). As a first approximation the following distributions are suggested:

Available values	Shape of distribution to use
{min,max}	Uniform
{mean, standard deviation}	Normal
{min,max,mode}	Triangular
{min=0, mean}	Exponential
(min,max,mean,sd}	Beta
{min>0,quantile}	Gamma
{min,max,mean}	Beta
{min=0, quantile}	Exponential

The following rules of thumb can offer some guidance in this setting (an important issue which is not explicitly addressed, but should nevertheless be taken into account, is the quality (e.g.) accuracy of the available information on min, max etc.):

If little or no relevant data exist, and information on min, max, or most probable value is not available, then it is recommended to carry out calculations with different PDFs in the parameter, to reflect whatever feasible information is available. The uncertainty-range in the corresponding outcomes gives a rough indication of the lack-of-knowledge in the parameter.
If Min,max is given try uniform distributions; in case of a large range, try loguniform distribution; if additionally a mode is given, try triangular, or likewise log-triangular in case of a large range.
If some relevant data exist, but cannot be represented by standard statistical distribution, then use piecewise uniform (empirical) distribution.
If substantial amount of data exist, and can be reasonably well represented by standard distribution, use estimation to find the characteristic distributional parameters (e.g. maximum likelihood, method of moments; Bayesian etc.)
In case the parameter can be expressed as quotient/product of other parameters, it is often feasible to approximate the PDF by a lognormal distribution (see also (Vose 2000; Cullen and Frey 1999; Morgan and Henrion 1900).

In the Cooke and Goossens protocol, the experts typically have to assess the 5, 50 and 95 quantiles (but other or more pinpoints can be chosen; cf. van Oorschot et al., 2003), as a basis for specifying the distribution. This information is further processed using minimal information theoretic arguments.

Van der Fels-Klerx et al. 2002 recommend the use of the graphical software tool ELI (van Lenthe, 1993) to elicit PDFs for continuous variables. It employs the general beta-distribution as a template, and makes elicitation easier, and less prone to errors and biases.

Needless to say, specific applications can require special forms of distributions (e.g. when modelling extreme events) and the above-given recommendations are therefore not necessarily the optimal ones.

6. Correlation or dependence specifications

The way in which dependence or correlation is taken into account and elicited can highly influence the outcomes of the uncertainty assessment. Often little is known on specific dependencies, and dependence elicitation is not an easy job. In the Cooke and Goossens methodology this elicitation is done using the notion of conditional probability, querying an expert to 'specify what the probability is that the value of Z will lay above its median, in case that Y was observed to lie above its median, in an experiment which involves both Z and Y'. This probability-information can be easily translated into a specific rank-correlation between Z and Y (cf. e.g. Kraan, 2002; section 2.1.2). In order to prevent that an expert has to specify too many dependencies - which moreover can easily lead to incompatible correlation matrices - two parsimonious query procedures have been proposed by Cooke and his co-workers, use copulas [9] as a basis for the dependence structure. The first one employs a tree (i.e. an acyclic undirected graph) in which the rank correlations are specified for a (limited) selection of all possible correlations. Using minimal information theoretical arguments and bivariate copulas (Meeuwissen and Cooke 1994), a sample is constructed with the requested marginal distributions having a compatible correlation structure with the specified dependencies. The second approach is a generalization of the correlation tree method, and uses a vine as the basic structure for specifying desired rank correlations. A vine is a nested set of trees build on top of each other where the edges of tree j are the nodes of tree j+1 (see e.g. Bedford and Cooke, section 17.2.5). By using partial correlation specification associated to the vine edges, and using e.g. elliptical copula, a sample can be constructed which exhibits desired marginals and a specific rank correlation matrix. The advantage of the partial correlation based specification is that no conditions like positive definiteness need to be satisfied for the specification, and that the associated sampling 'works on the fly': i.e. one sample vector at a time is drawn, without a need to store large numbers of samples in memory. See e.g. Kurowicka and Cooke (2001). Part of these procedures have been implemented in UNICORN (Cooke, 1995).

Apart from the way in which correlation and dependence is expressed mathematically, also the structuring (decomposition, recomposing and aggregating) of the problem will to a large extend determine in which way dependence will be encountered. It makes for instance quite a difference whether the basic level of analysis and elicitation is an individual, a subpopulation or a complete population. Moreover, (unexpected) dependencies and correlations can be introduced when both aleatory and epistemic uncertainties are present (Hora, 1996): e.g. when the parameters which describe the variability are not completely known [10] this epistemic uncertainty in fact pervades all elements of the population in a similar way, rendering a certain dependence between the sampled individuals. The associated uncertainty which is introduced in this manner in fact reflects a systematic error, cf. also Ferson and 1996.

Part of these issues are illustrated by the recent study of van Oorschot et al. 2003 where an extensive uncertainty assessment of an emission inventory is reported. This assessment consisted of upgrading an expert elicitation on the individual level towards uncertainty statements on the total population level, taking due account of the presence of aleatory and epistemic uncertainty. Experiences with this study made clear that elicitation and analysis of aleatory and epistemic uncertainty and associated dependencies, remains a challenging field of research.

7. Aggregating expert judgements

In practice it can occur that expert opinions differ considerably on specific issues (cf. Morgan and Keith, 1995; Morgan et al. 2001, Morgan, 2003). There is no univocal strategy on how to handle these situations:

One option is trying to combine the individual estimates into a kind of group-PDF (see e.g. Clemen and Winkler, 1999), which is supposed to 'summarize' the group opinion. However one has to be careful with drawing such a conclusion: a 'summary' in the form of one group PDF does not necessarily express a consensus, and moreover the summary may obscure or suppress differences among experts and thus over present the precision in the judgements (Hora, 1992). It has to be stressed that it always will be important - notwithstanding the focus on coming up with a group- PDF - to analyse and discuss the diversity in individual PDFs in some detail: e.g. where does it occur, what are its main reasons and what are its major effects on the final results of interest (compare e.g. the robustness and discrepancy analysis of Cooke and Goossens 2000a,b). Such kind of analysis will render relevant information for the decision maker on how to interpret and use the results and where to focus for potential improvements in uncertainty information.
Another option is not to combine the individual PDFs in case of considerable diversity, but to present and discuss the diversity in the individual PDFs separately in its full scope, indicating its potential consequences for policy analysis. See e.g. Keith, 1996, who advocates that diversity can serve as a warning flag to seek for meaningful alternative modes of policy analysis, which may be highly relevant for the debate concerning the problem. He warns against adhering to one 'pseudo'-truth, on basis of an 'aggregated PDF', and thereby masking diversity which can be due to e.g. disparate values and interests. Especially in cases where there exist scientific controversies, we recommend to avoid combining expert judgements because it goes at the expense of transparency of the analysis and looses some insightful information on the level and nature of scientific controversy and expert disagreement. In case of policy problems, which require a post-normal science approach to risk assessment, such information is crucial and should not be obscured by aggregation.

Practical considerations will however often force one to work with one PDF for each source of uncertainty (e.g. it is often practically infeasible to work through and present all the multi-expert combinations on all sources of uncertainty etc.). Given the above caveats it is important to clearly indicate the assumptions and limitations in doing so, to prevent that the results will be wrongly interpreted.

Finally we will discuss two main ways to aggregate expert opinions (see Clemen and Winkler, 1999):

Using behavioural approaches which try to establish a consensus-PDF by carefully-structured joint group meetings (using e.g. 'Delphi method', 'nominal group technique', 'decision conferencing', to exchange, discuss and process information and views etc.). Trying to avoid social and cognitive trappings in group discussion is an important issue, but no overall best method seems to exist. Often consensus cannot be reached, despite of repeated group-meetings. Mathematical aggregation is finally used as a rather arbitrary and artificial way of providing one PDF.
Using mathematical aggregation approaches which can range from simply averaging the individual information on probabilities to a more sophisticated analysis of the information aggregation process, accounting for information on the performance-quality of the experts (Goossens, Cooke and Kraan, 1998) and the dependence among the experts' probabilities (Reichert and Keith, 2003). Clemen and Winkler, 1999, state that the more complex combination rules sometimes outperform the simple rules (e.g. simple averaging), but that they can be more sensitive, leading to poor performance in some instances (i.e. robustness is low).

The empirical evidence in Clemen and Winkler 1999 does not suggest a preference for one of these approaches, but suggests that a wise use of both methods will often be the best approach in practice. More future research is apparently needed.

[5] Though these latter aspects (lack of consensus, controversy) can also be partly seen as system properties, reflecting variability and diversity rather than lack-of-knowledge.

[6] Some researchers even argue that at a basic level it is questionable to make this distinction (see Winkler, 1996; Bedford and Cooke, 2001), but that from a practical viewpoint it can certainly be helpful to distinguish between aleatory and epistemic, in order to decompose and analyse the underlying issues leading to uncertainty in an adequate way.

[7] This distinction has been described under various names, e.g. stochastic, type A, irreducible, variability for aleatory, and subjective, type B, reducible, and state of knowledge for epistemic.

[8] See e.g. Hofer, 1996, for a clear discussion on the need to use this distinction. See also Hoffman and Hammonds, 1994.

[9] A copula is a joint distribution on the unit square having uniform marginals. It provides a suitable technique for modelling dependence, going beyond the pitfalls of correlation. (see e.g. Embregts, McNeil and Straumann, 1999; Clemen and Reilly, 1997)

[10] I.e. there is epistemic uncertainty concerning the precise form of the variability.