You are on page 1of 28

Analysis of Numerical Solution Error and Uncertainty Using Statistical Effect Screening

Franois M. Hemez, 1 Marine Marcilhac 2 Applied Physics Division (X-Division) Los Alamos National Laboratory, Los Alamos, New Mexico 87545
ABSTRACT: The goals of the Code Verification project at Los Alamos National Laboratory are to develop methodologies for code and solution verification, assess time-to-solution, and quantify solution uncertainty in support of programmatic deliverables. One exercise recently completed was to perform 12,256 simulation runs with an Advanced Scientific Computing multi-physics code to quantify the level of numerical uncertainty of discrete solutions. The algorithm investigated is a finitevolume, second-order Godunov solver for the compressible equations of hydrodynamics in Eulerian frame-of-reference. This publication reports on the analysis of six test problems (known as Noh, Sedov, Woodward-Colella waves, and three variants of the Sod shock tube) that, while restricted to 1D geometry, include smooth and shocked solutions, convergent and divergent flows, and various patterns of wave interaction. The datasets are managed using a toolbox whose capabilities include generating designs of computer experiments, writing and submitting multiple input decks for parametric study, post-processing the results, performing analysis-of-variance and effect screening, and analyzing solution accuracy as a function of refinement. A design of computer experiments is analyzed to vary the grid size, Courant condition, two time step controls, and four other options of the algorithm (flux limiter, interface treatment, type of hydro-algorithm, and amount of artificial viscosity). Since exact solutions of the continuous equations exist for some of these test problems, solution accuracy can be defined as the difference between exact and discrete solutions. The analysis of solution accuracy highlights that, first, the manner in which differences between exact and discrete solutions are defined is crucial to reach asymptotic convergence and, second, the numerical method is first-order accurate in the presence of discontinuities, as expected. Statistical analysis-of-variance and effect screening are proposed to quantify, with rigor, the influence that discretization and numerical options exercise on solution accuracy. (Approved for unlimited, public release, LA-UR-07-4223.)

1. INTRODUCTION
The goals of the Code Verification project at Los Alamos National Laboratory (LANL) are to develop methodologies for code and solution verification, assess time-to-solution, and quantify solution uncertainty in support of programmatic deliverables. Verifying the numerical accuracy of discrete solutions computed by simulation software is important because the partial differential equations that govern the equations of motion or laws-of-conservation in computational engineering or physics are discretized for resolution with finite-digit arithmetic. The challenge is
Technical Staff Member in X-Division; E-mail: hemez@lanl.gov; Phone: 505-667-4631; Mailing address: Los Alamos National Laboratory, X-1, Mail Stop B259, Los Alamos, New Mexico 87545. 2 Post-doctoral Research Assistant in X-Division; E-mail: marcilh@lanl.gov.
1

to understand the extent to which approximate solutions of the discretized equations converge to the (unknown) exact solution of the continuous equations. Verification is the first V of the Verification and Validation (V&V) of predictions obtained from numerical simulations. Activities of V&V, such as solution verification, test-analysis correlation, model calibration, and uncertainty quantification, establish the level of confidence that can be placed in a Modeling and Simulation (M&S) capability and the decisions it supports [1]. Solution verification can be defined as a scientifically rigorous and quantitative process for assessing the mathematical consistency between continuum and discrete variants of partial differential equations used to represent a reality of interest [2]. Verification involves comparing numerical solutions obtained from calculations performed on successively refined meshes or grids to a reference. The main difficulty is that the exact solution of the continuous equations is not always known and available to define such reference. One exercise recently completed at LANL was to perform a total of 12,256 simulation runs with an Advanced Scientific Computing (ASC) multi-physics code to quantify the level of numerical uncertainty of discrete solutions [3]. The algorithm investigated is a finite-volume, second-order Godunov solver for the compressible equations of hydro-dynamics in Eulerian frame-ofreference [4]. Six test problems, known as Noh, Sedov, Woodward-Colella waves, and three variants of the Sod shock tube, are analyzed that, while restricted to 1D geometry, include smooth and shocked solutions, convergent and divergent flows, and various patterns of wave interaction. Running times needed to analyze each problem vary between 10 seconds and eight hours, using four processors of the QSC platform at LANL and depending on settings of the numerical method. The overall resource needed to compute and post-process solutions for the six problems represents an effort of about two months of equivalent, single-processor time. The datasets are managed using a toolbox whose capabilities include generating designs of computer experiments, writing and submitting multiple input decks for parametric study, postprocessing the results, performing the statistical analysis-of-variance and effect screening, and analyzing solution accuracy as a function of refinement. The toolbox for Fitting Error Ansatz in Space and Time (FEAST) is currently implemented in the MATLABTM programming environment and briefly discussed in References [5-6]. This publication reports on some of the conclusions reached by applying the FEAST toolbox to the datasets of six hydro-dynamics test problems. The focus is on analyzing the asymptotic convergence of solution accuracy as a function of level of mesh or grid refinement, and quantifying solution uncertainty. The discussion is specialized to results obtained with three variants of the Sod shock tube test problem. A design of computer experiments is analyzed to vary the grid size, Courant condition, two time step controls, and four other options of the algorithm (flux limiter, interface treatment, type of hydro-algorithm, and amount of artificial viscosity). Because the exact solution of the continuous equations exists for some of the test problems analyzed, solution accuracy can be defined as the difference between exact and discrete solutions. Once these differences are computed, their L1, L2, or L norms are analyzed to assess the extent to which solution accuracy converges asymptotically as a function of level of mesh or grid refinement. The solver investigated here is designed to yield 2nd-order accurate truncation error in the case of regular solutions. Truncation error generally reduces to first-order in the presence of discontinuous solutions. Verifying that the observed rate-of-convergence matches the theoretical expectation is important not only to provide confidence in the implementation and performance of algorithms, but also to quantify the levels of solution uncertainty at given mesh or grid sizes. Analyzing the L2 norms of differences between exact and discrete solutions highlights two main findings. First the manner in which differences between exact and discrete solutions are defined is crucial to make truncation error behave as expected in the regime of asymptotic convergence.

Not defining solution differences in a manner that is consistent with how the numerical method constructs the discrete solutions yields sub-optimal rates-of-convergence, which could lead the code developers or analysts to believe that something is wrong with the algorithms. Second the numerical method is, as expected, first-order accurate in the presence of discontinuities. Further investigation with statistical analysis-of-variance is proposed to rigorously quantify the influence that discretization and numerical options exercise on solution accuracy. Statistical effect screening performed for the three Sod shock tube test problems indicates that solution accuracy is overwhelmingly controlled by the level of refinement used in the calculation. Hence the hydro-dynamics method offers an inherent level of robustness to the other numerical options, such as Courant condition, time step control, or type of flux limiter. The results obtained with other test problems, not presented here, further indicate that improving solution accuracy tends to come at the cost of increased sensitivity, or reduced robustness, to how the simulation runs are controlled by some of these other numerical options. A brief overview of the hydro-dynamics computer code is given in section 2. Options that control the numerical method are defined in section 3. These numerical options, together with the level of refinement of the computational domain, are organized according to designs of computer experiments that setup runs of the code. These designs are also discussed in section 3. Section 4 presents the Sod shock tube test problem and its exact solution. Two approaches to define the difference between exact and discrete solutions are discussed in section 5 and the results obtained with each approach are presented. Section 6 addresses the asymptotic convergence of solution accuracy as a function of grid refinement. Finally section 7 discusses the results of statistical analyses-of-variance that quantify the influence of numerical options on solution accuracy, and section 8 concludes the publication. All results presented are restricted to three variants of the Sod shock tube problems.

2. BRIEF OVERVIEW OF THE RAGE HYDRO-DYNAMICS CODE


The simulation results analyzed in this publication are obtained by running a computer software known as RAGE. This code is developed at Los Alamos by the Advanced Scientific Computing (ASC) Crestone Code Project. The hydro-package of RAGE is a massively parallel, multi-material code that solves the compressible Euler equations of gas dynamics for various geometries in 1D, 2D, or 3D. The laws-of-conservation are written and solved in an Eulerian frame-of-reference, that is, the computational grid is kept fixed with materials flowing through cells of the spatial discretization. The code also implements an Automated Mesh Refinement (AMR) capability that is not exercised in this work, which means that all grids analyzed here are kept fixed and uniform. References [7-10] detail the code and its numerical implementation. Applications of the hydro-dynamics package range from high-fidelity simulations of small-scale instabilities, such as Raleigh-Taylor or Richtmyer-Meshkov, for material modeling to large-scale simulations for problems such as those encountered in the disciplines of astrophysics, global climate modeling, or studying the consequences of tsunami or asteroid collision. An illustration of impact simulation performed with RAGE is shown in Figure 1. The figure represents the impact on the ocean floor of a 10-km diameter asteroid at the velocity of 15 km.sec1. Details of the simulation setup and results are available from Reference [11]. The strength of the impact generates a crater, shown on top of Figure 1. The bottom of Figure 1 illustrates that the impact projects large amounts of material debris up to heights of 50 km, approximately, which is well into the upper atmosphere.

(1-a) Simulation of crater formed by the impact of an asteroid.

(1-b) Detail of the density field of material debris projected in the atmosphere. Figure 1. Simulation of asteroid impact with hydro-code RAGE. (Pictures courtesy of G. Gisler, R. Weaver, C. Mader, and M. Gittings, Reference [11].) The governing equations of hydro-dynamics are written in terms of the conservation of mass, momentum, and total energy in the computational domain:

+ ( .U) = 0 t ( .U) + (( .U) : U) + p = 0 t E + ((E+ p ).U) = 0 t

(1)

where the four state variables are density (), flow velocity (U), thermodynamic pressure (p), and total energy (E). Total energy and Specific Internal Energy (SIE) are related by:
2 1 E = . e + . U 2 2

(2)

Counting the number of state variables of equations (1) gives (; U; p; E). This is a total of five variables in 2D where the velocity vector is U = (UX; UY) or six variables in 3D where the velocity vector is U = (UX; UY; UZ). The number of equations (1) is equal to four equations in 2D and five equations in 3D. It means that one additional piece of information is needed to bring closure to this system of equations. It is provided by the equation-of-state. The polytropic gas equation-ofstate is assumed, here, for simplicity even though RAGE implements many other possibilities. The relationship between density, pressure, and energy of the gas is given by: (3) where is the constant, adiabatic exponent of ideal gas. The temperature is then simply given by T = p/ and entropy in the gas is calculated as S = p/(). As stated previously equations (1) describe the conservation of mass, momentum, and total energy in Eulerian frame-of-reference for the dynamics of a single, compressible, inviscid, and non-heat-conducting gas. The code verification test problems analyzed here are described by this system of equations (1-3). Although the demonstration is limited to 1D spherical or slab geometry, these equations can represent a variety of flows depending on the choice of boundary and initial conditions. They can produce many different patterns of wave interaction. The well-known Noh and Sedov test problems are examples of converging and diverging flows. The Noh and Sedov test problems is described in References [12-13] and [14-15], respectively. Because of its non-linearity and hyperbolic nature, the system of equations (1-3) can produce discontinuous (shocked) solutions. The numerical method investigated is a finite-volume, second-order Godunov scheme for the compressible equations of Eulerian hydro-dynamics [4]. The algorithm is 2nd-order accurate for smooth, that is, continuous and differentiable, solutions. It should also be capable of providing well-resolved, non-oscillatory discontinuities. The concept of a high-resolution numerical method is to rely on high-order numerical approximations and propose a modification of the method that increases the amount of numerical dissipation in the neighborhood of a discontinuity in the flow. Flux limiters are used to avoid producing spurious oscillations due to shocks, discontinuities, or sharp changes in the solution domain. Details of the Godunov numerical scheme can be obtained from References [4, 16-18]. Further describing the solver would distract from the main points that we are attempting to make in this publication. Instead the numerical options exercised during the simulation runs are described in the next section because understanding these settings is important to interpret the results of statistical effect analysis discussed in section 7.

p = ( 1). . e

3. NUMERICAL OPTIONS OF THE RAGE HYDRO-DYNAMICS CODE


When running the RAGE hydro-code the user can define different numerical options, such that the type of interface treatment for multi-material flows, numerical scheme implemented to compute the flux limiters, or type of hydro-algorithm exercised in the calculation. The fact that various numerical settings need to be defined is by no means specific to hydro-dynamics or fluid dynamics. The finite element method, for example, implements artificial dissipation in the form of a bulk viscosity parameter for solid elements. Thresholds such as tolerances for iterative convergence, detecting a singularity, or calculating rigid body modes can also have a significant effect on the numerical quality of discrete solutions. This is at the center of the question addressed here, in addition to assessing the asymptotic convergence of the Godunov solver as a function of grid refinement. The numerical simulation cannot be analyzed without, first, initializing these numerical options. But what is their effect on the quality of discrete solutions? Would some combination of, for example, flux limiter and cell

size be detrimental to the numerical quality? Surely understanding the extent to which these settings influence the quality of discrete solutions matters to analysts who are responsible for producing the numerical simulations. It is to shed light on this issue that a systematic investigation is performed of the influence of numerical options on the quality of discrete solutions predicted by the hydro-code. Our study is currently restricted to a few code verification test problems that admit exact solutions. The main advantage of working with problems that have exact solutions is that their veracity cannot be questioned. Discrete solutions predicted by the code must converge to these exact references. The price to pay is simplification. Exact solutions cannot, in general, be derived if the equations, boundary conditions, initial conditions, or forcing functions are complicated. An effort is made, however, to define a sequence of test problems that exercise a variety of flow patterns that are representative of what would need to be simulated in real applications. The approach generally followed to verify asymptotic convergence and assess the numerical quality of discrete solutions is to perform a mesh refinement study [19-20]. It is what is done here except that, instead of running the test problems on successively refined grids using commonly-accepted default settings, the numerical options are varied during mesh refinement. The goal is to estimate the influence that these user-selected settings and options may exercise on the solution and its accuracy. The null hypothesis we are formulating is that numerical options have lesser influence on solution quality than the discretization, or cell size, x. The goal of the analysis is to verify whether the null hypothesis is correct or not. If it is not, the implication is then that selection of options that control the algorithm influences the accuracy of discrete solutions more so than the cell size. This would not be indication of a robust algorithm.

Figure 2. Solution error ||yExact y(x)||1 vs. size x for four hydro-dynamics problems. Figure 2 illustrates a sample of results obtained by refining the computational grid and varying numerical options in the case of four test problems (Noh, Sedov, Sod with a single gas, Sod with two gases). The Noh and Sedov problems are analyzed in 1D spherical geometry. The Sod

problems are analyzed in 1D slab geometry. All four admit exact solutions. The horizontal axis represents the cell size x and the vertical axis is the L1 norm of solution error ||yExact y(x)||1 for the density field. Multiple error values at cell sizes x = constant result from running a problem with the same grid and varying numerical options that control the calculation. The overall trend observed in Figure 2 is that discrete solutions converge to the exact solution as the mesh is refined. The spread of solution error values also indicates that accuracy can be sensitive to the selection of numerical options. The question asked above is, therefore, relevant: is the accuracy vulnerable to a potential loss-of-robustness of the algorithm? The figure illustrates the influence that discretization and numerical options have on accuracy. It does not, however, quantify the relative influence of these competing effects. A methodology is discussed in section 7 to quantify the influence of discretization and code options on solution accuracy. For simplicity it is applied to a single code verification test problem (the Sod shock tube problem). The starting point of the analysis is to list options that control the numerical method. In the study reported here, the algorithm is a second-order Godunov scheme for the laws-of-conservation of hydro-dynamics and it is decided to investigate the mesh size, time control options, and a few other options. The options that seem important to study are obtained through discussion with experienced analysts and members of the code development team. They are described below. The first parameter is, of course, the discretization size denoted by symbol x or referred to as DXSET that is the command of a RAGE input deck used to define a uniform grid. The second parameter is the user-defined time step control t or DTFORCE. Assigning a value, for example, DTFORCE = 1.0E-06 in the input deck overwrites the time step controls of the algorithm and forces the problem to run at the fixed value of t = 1 micro-second. The third parameter is the Courant-Friedrichs-Levy (CFL) stability condition, CFL or CSTAB. It states that, when using a uniform grid size x (in 1D) and integrating the equations with a time step t, information cannot be propagated faster than the maximum speed-of-sound of the (compressible) material: t 1 (4) x where Max denotes the speed-of-sound calculated by the hydro-code from the state variables (pressure, density, temperature) in every cell of the computational domain. The fourth option is an auxiliary time step stability factor, TSTAB. It limits the time step t that a cycle (or iteration) is allowed to take according to velocity constraints, such that: CFL = Max .

U X U Y UZ max x ; y ; z TSTAB

(5)

where U = (UX; UY; UZ) is the 3D velocity vector for each cell of the computational domain. A command TSTAB = 0.2, for example, limits the time step such that the transport boundary of material across a cell does not exceed 20% of the cell size based on material velocity. The four other parameters studied are defined as numerical options that control the method. Flag HYDRO_VERSION takes two values (2 or 50) and selects an algorithm that solves the lawsof-conservation of hydro-dynamics. The value HYDRO_VERSION = 2 selects a Direct Eulerian (DE) solution technique. Option 50 is the same DE hydro-algorithm except that it implements a Volume of Fluid (VoF) algorithm to treat the interface between multiple materials within cells of the computational domain. Flag NUMRHO selects the flux limiters, also referred to as slope limiters, that modify the estimation of gradients, or slopes, of the state variables. Four settings can be selected. The value NUMRHO = 0 reduces the numerical scheme to a low-order method (first-order accurate). Option 1 is the min-mod algorithm. Option 2 is the same minmod algorithm plus an additional iteration to improve the convergence of gradients. Option 4

implements the Van Leer flux limiters. These options for slope limiters are detailed in Reference [7] and the users manuals [8-9]. Flag INTERFACE_OPTION selects the interface treatment implemented in the case of multiple materials. The code modifies the calculation of derivatives of fractional volumes within cells with the objective of limiting diffusion at material interfaces. The value INTERFACE_OPTION = 0 turns off the interface treatment. Option 1 activates a steepened contact algorithm, also termed Interface Preserver (IP). Option 3 selects the VoF algorithm and it can only be used in conjunction with option HYDRO_VERSION = 50. Finally flag HYDROBET controls the addition of artificial viscosity to the discrete solution. If the hydro-dynamics code is intended to solve problems containing shock waves, then special care is required to calculate the shock front. Otherwise post-shock oscillations destroy the accuracy. A standard strategy is shock-capturing through the introduction of an artificial viscosity term that is proportional to velocity [21]. Flag HYDROBET is a continuous variable that takes values in the range 0 HYDROBET 2. Table 1. Definition of settings and numerical options of the RAGE hydro-code. RAGE Option DXSET DTFORCE CSTAB TSTAB HYDRO_VERSION NUMRHO INTERFACE_OPTION HYDROBET Description Cell size Time step CFL condition Velocity-based time step control Type of hydro-algorithm Type of flux limiters Type of interface treatment Velocity-based artificial viscosity Values 5-to-7 user-defined levels (x) 4 user-defined levels (t) 0.3, 0.5, 0.7, or 0.9 0, 0.1, 0.3, or 0.5 2 or 50 0, 1, 2, or 4 0, 1, or 3 0, 0.4, 1.2, or 2

Table 1 summarizes the eight parameters selected for analysis. Note that the code implements other control options that are deemed unimportant and kept constant and equal to their default settings. Five-to-seven grids are analyzed for each code verification test problem. The fact that a minimum of five grids are analyzed to study convergence, combined with the other settings surveyed in Table 1, means that a relatively large number of runs are needed to explore all possible combinations of grid size x and numerical options. In the case of a single-material problem, for example, analyzed with HYDRO_VERSION = 2, the total number of runs needed to setup a full-factorial design of computer experiments is equal to 5 (DXSET) x 4 (DTFORCE) x 4 (CSTAB) x 4 (TSTAB) x 4 (NUMRHO) x 2 (INTERFACE_OPTION) x 4 (HYDROBET) = 10,240 runs. To analyze a multi-material problem with, again, a minimum of five grids, these runs need to be augmented with additional calculations to assess the VoF algorithm selected with commands HYDRO_VERSION = 50 and INTERFACE_OPTION = 3. It is an additional number of 5 (DXSET) x 4 (DTFORCE) x 4 (CSTAB) x 4 (TSTAB) x 4 (NUMRHO) x 4 (HYDROBET) = 5,120 runs. Needless to say that handling large numbers of runs makes it necessary to develop a testing harness capable of automatically writing the input decks, submitting jobs, calculating the exact solutions, and organizing the datasets for analysis. The software used is an in-house toolbox of functions developed within the MATLABTM programming environment and called Fitting Error Ansatz in Space and Time (FEAST) [5-6]. FEAST also integrates basic functionalities to design computer experiments and perform the statistical analyses of variance decomposition and effect screening discussed in section 7. The computing time and memory resources needed to run FEAST are insignificant compared to run times of some of the code verification test problems. Uploading the datasets in MATLABTM memory and best-fitting a solution error Ansatz model or performing a statistical analysis only

takes a few minutes. On the other hand running times needed to analyze each test problem with the hydro-code RAGE vary between 10 seconds and eight hours with four processors of the 30TeraFLOP QSC Platform at Los Alamos, depending on settings of the numerical solver. 3 The remainder of the discussion is specialized to the single-material, Sod shock tube in 1D slab geometry. Numerical options that are relevant to this test problem are summarized in Table 2. Options not listed in Table 2 are kept constant and equal to their default settings. Table 2. Settings and numerical options exercised with the Sod shock tube test problem. RAGE Option DXSET DTFORCE CSTAB NUMRHO INTERFACE_OPTION Description Cell size Time step CFL condition Type of flux limiters Type of interface treatment Values 10, 40, 160, 620, or 2,560 cells 1, 5, 25, or 125 micro-sec. 0.3, 0.5, 0.7, or 0.9 0, 1, 2, or 4 0 or 1

(Legend: the control parameters not included in the table are kept constant and equal to the following values: TSTAB = 0.2, HYDRO_VERSION = 2, HYDROBET = 0.25.)

Five grids are analyzed with uniformly-spaced cell sizes of x = 0.1 cm, 25 mm, 6.25 mm, 1.5625 mm, and 390.625 microns. These grids implement a constant refinement ratio of R = 4. The full-factorial analysis of settings defined in Table 2 yields a total number of runs equal to 5 (DXSET) x 4 (DTFORCE) x 4 (CSTAB) x 4 (NUMRHO) x 2 (INTERFACE_OPTION) = 640 runs. Note that option INTERFACE_OPTION, that selects different interface reconstructions in the case of multiple materials, is exercised even though the problem is single-material. (Varying this option should have absolutely no influence on the accuracy of discrete solutions.) It is done on purpose to verify that the statistical effect screening of section 7 reaches sensitive results.

4. THE SOD SHOCK TUBE PROBLEM IN 1D SLAB GEOMETRY


The results presented in sections 4-6 are restricted to the analysis of the Sod shock tube code verification test problem in 1D, slab geometry [22]. Three other test problems (Noh, Sedov, and the Woodward-Colella waves) are analyzed as part of this investigation but the results are not discussed here. The behavior of truncation error is assessed in sections 5-6 with the original Sod problem. Statistical analysis is performed in section 7 using three variants of the problem. These include the original problem, a second version initialized with a very severe discontinuity known as the LeBlanc shock tube test problem, and a two-material version of the original Sod shock tube. The original problem is described next.
Membrane Removed at Time t = 0

Left Initial Condition (L; UL; pL)

Right Initial Condition (R; UR; pR)

x = 0 cm

x = cm

x = 1 cm

Figure 3. Illustration of the Sod shock tube problem.


3

30-TeraFLOP means 30 x 10+9 floating point operations (additions, multiplications) per second.

The Sod shock tube test problem is illustrated in Figure 3 and the initial conditions of density, velocity, and pressure are defined in Table 3. The problem was used by Sod to benchmark various numerical methods that solve the equations of compressible flow [22]. It consists of simulating a shock front moving into the region of lower pressure (right shock) and a rarefaction wave that expands into the region of higher pressure (left rarefaction). Table 3. Definition of the Sod shock tube test problem. Physical Quantity Coordinates Density Initial Condition Velocity Initial Condition Pressure Initial Condition Equation-of-state Adiabatic Constant Left Domain 0 x cm L = 1 gm.cm3 UL = 0 cm.sec.1 pL = 1 ergs.cm3 Ideal Gas = 1.4 Right Domain x 1 cm R = 0.125 gm.cm3 UR = 0 cm.sec.1 pR = 0.1 ergs.cm3 Ideal Gas = 1.4

The physical setup illustrated in Figure 3 is a tube filled with gas, initially divided in two sections by a membrane. The gas has higher pressure and density in one half of the tube than the other half. At time t = 0 sec. when the simulation starts, the membrane is removed and gas is allowed to free-flow. The structure of this flow involves three distinct waves separating regions in which the state variables are constant, as shown in Figure 4: (1) A shock wave propagates into the region of lower pressure, across which the density and pressure jump to higher values and all state variables are discontinuous; (2) It is followed by a contact discontinuity, across which the density is again discontinuous but the velocity and pressure are constant; and (3) The third wave moves in the opposite direction and all the state variables are continuous; this pattern is called a rarefaction wave because density decreases as the waves passes through.

Figure 4. Exact solution of the Sod shock tube problem.

Figure 4 illustrates the exact solution of the continuous equations for the density field. The exact solutions of this and other code verification test problems are computed by the independent, MATLABTM-based code FEAST. The implementation of exact solutions has been thoroughly verified against results published in the literature. Computing the exact solution is generally inexpensive compared to running the hydro-code for multiple grids of a refinement study, even in cases where the exact solution involves iterative solvers and numerical integration such as in the case of the Sedov test problem. Because the solution pictured in Figure 4 is exact, it can be computed anywhere in the domain and at any time. For example, the exact solution field can be computed at the coordinates of centers of cells for each element or cell of the computational domain. It is nevertheless not what is recommended to assess the asymptotic convergence of discrete solutions, as suggested rather unambiguously by the results obtained in section 6. Even though the density solution illustrated in Figure 4 comes from an idealized and highly simplified test problem, it is important to understand that the solution is relevant to flow patterns encountered in Nature and, therefore, flows that need to be simulated as accurately as possible by the codes. Simplification is precisely what enables code verification for which, for example, representing the geometrical features of the computational domain with a high level of fidelity is beside the point. What is important is to define test problems, such as the Sod shock tube, that exercise the same algorithms, the same lines of code as those that will then be applied to the applications of interest.

Figure 5. Observation of the expanding shock front of a supernova. (Credit: NASA Jet Propulsion Laboratory, Pasadena, California.) Figure 5 illustrates such an example from the discipline of astrophysics. It shows a shock wave propagating outwards due to the explosion of a supernova. The picture combines observations from several telescopes to unveil a bubble-shaped cloud of gas and dust. The particular cloud shown is 14 light-years wide and expanding at velocities of, approximately, 6 million kilometers per hour. Microscopic dust particles are heated by the supernova shock wave, then, re-radiate the energy as infrared light. This explains the abundance of red color in Figure 5. This example indicates why it is important to verify the ability of the hydro-code to simulate shocked flows of multi-materials using separate-effect test problems that do not, in the case of this application to astrophysics, integrate other phenomena such as the flow of radiation from a moving fluid.

Figure 6 compares the exact solution of the continuous laws-of-conservation, shown in Figure 4, to several discrete solutions obtained by running the hydro-code for the Sod problem in 1D. The calculations are performed with five uniform grids, the coarsest of which has a resolution of 10 cells or x = 0.1 cm. The discrete solutions compared in Figure 6 are obtained with the same numerical options, except for x that gets refined at the constant ratio of R = 4, as defined in Table 2. The Courant stability condition is set to 0.9 (CSTAB = 0.9), the time step controller (DTFORCE) is turned off, interface treatment is activated (INTERFACE_OPTION = 1), and the Van Leer flux limiter is implemented (NUMRHO = 4).

Figure 6. Comparison of density profiles for the exact and five discrete solutions. Discrete solutions are illustrated in Figure 6 as step functions to emphasize the fact that, with a finite volume method, the value of a state variable for each cell of the discretization is constant over the cell volume (3D), area (2D), or length (1D). The solution obtained based on 10 cells is shown with 10 such steps (blue, solid line). Likewise the solution obtained with 40 cells is shown with 40 steps (pink, solid line), etc. The visual illustration of Figure 6 suggests that the entire profile of density values converges to the exact solution as x 0. This is, however, a visual comparison that needs to be rigorously quantified.

5. CALCULATION OF SOLUTION ERROR FOR THE SOD 1D PROBLEM


Verifying the convergence of discrete solutions, as x 0, hinges on the concept of asymptotic regime of convergence. Different choices of mesh or grid size induce various behaviors of the overall numerical error. By definition the asymptotic regime is the region of discretization where truncation error dominates the overall production of numerical error. If truncation error dominates, then the numerical error can be reduced, that is, solution accuracy can be improved, by performing the calculation with smaller element or cell sizes. We have just outlined the principle of conducting a mesh or grid refinement study.

Because truncation dominates within the asymptotic regime of convergence, the behavior of numerical error can be modeled mathematically using an equation such as:

(x) = yExact y(x) = . xp + O xp+1

( ) (

(6)

where (x) denotes the difference, estimated in the sense of a user-defined norm ||||, between the exact solution yExact of the continuous partial differential equations and the discrete solution y(x) obtained with an element or cell size x. In equation (6) the pre-factor represents a regression coefficient. The exponent p characterizes the rate at which the solution error (x) is reduced when the mesh or cell size decreases. If truncation dominates the overall production of solution error, then the observed value of exponent p should match the rate-of-convergence of the numerical method implemented in the code. For example a finite element calculation that uses linear shells should exhibit a rate equal to pTheory = 1. Likewise a second-order accurate Gudonov scheme should exhibit a rate equal to pTheory = 2. Figure 7 illustrates the meaning of parameters (; p) of equation (6). If solution error (x) is represented graphically versus x on a log-log scale, then the exponent p is the slope of a straight line that best-fits the data and log() is the value of the intercept at x = 0. log((x)) log(||yExact y(xC)||) log(||yExact y(xM)||) log(||yExact y(xF)||) log() xF xM p

log(x) xC

Figure 7. Illustration of solution error (x) = ||yExact y(x)|| vs. size x. In the case of a code verification study, such as the one performed here with the Sod problem, where the exact solution yExact of the continuous equations is known analytically, the left-hand side (x) = ||yExact y(x)|| can be computed given a discrete solution y(x) and an appropriate norm ||||. Assuming that the higher-order terms of the expansion of truncation error can be neglected, the only remaining unknowns of equation (6) are the two parameters (; p). Two discrete solutions, one obtained from a coarse grid of cell size xC and another one obtained from a fine grid of cell size xF, suffice to calculate the pair (; p). The two equations are:

(x C ) = y Exact y(x C ) . x p C Exact p y(x F ) . x F (x F ) = y


and the solution for parameters (; p) is given by:

( ) ( )

(7)

(x C ) log (x ) F p= , log(R )

(x C ) (x F ) = p x p x F C

(8)

where R denotes the refinement ratio, that is, R = xC/xF > 1. When more than two calculations are available, such as the five grids used here, rates-of-convergence can be estimated from pairs of (coarse; fine) solutions as shown in equation (8). Another approach, which is the one adopted, is to implement a numerical optimization solver to best-fit the two-parameter equation (x) = .(xp) to the solution error data. It provides a single pair of parameters (; p) that bestfits the five grids analyzed for the Sod shock tube problem. The issue that remains to be discussed prior to proceeding to the analysis of asymptotic convergence of discrete solutions is: how should solution error, (x) = ||yExact y(x)|| be defined? This is generally not discussed in the disciplines of code or calculation verification. We will nevertheless illustrate that the definition of solution error is of paramount importance to interpret the results of a mesh refinement study. Most often, verification studies define solution error (x) as the L1, L2, or L norm of point-wise differences between the exact yExact and discrete y(x) solutions. By point-wise we mean that a difference is taken between the value of a state variable at the center of a computational cell, or at an integration point in the case of finite element analysis, and the value of the exact solution at the same spatial location. The Lp norm used for analysis then accumulates, in a volumeweighted sense, these point-wise differences. Such definition of solution error, however, poses a fundamental question. Does it make sense to define an error vector whose size changes with the grid? In the case of the Sod test problem, for example, the coarsest-grid solution defines a vector over 10 computational cells, which means that y(x) R10. Likewise the point-wise defined error vector yExact y(x) belongs to R10. The next level of refinement possesses 40 cells, which leads to a solution vector that belongs to a different mathematical space, y(x) R40, etc. Does it make sense to compare solution errors defined in R10, R40, R160, R640, and R2,560 as the computational grid is refined?

Option-I

Option-II
Average

yExact

yExact

Error y(x) y(x)

Error Average

Figure 8. Graphical illustration of two approaches to definite solution error, yExact y(x). Another approach, that remedies this potential inconsistency, is to define a solution error that belongs to a reference space no matter which discretization is used to compute the discrete

solution y(x). It means that the discrete solutions are transformed from the computational grid to a reference grid that is independent of the level of refinement. Two basic transforms can be implemented, based either on interpolation or averaging. Results are presented here where the discrete solutions are weighted-averaged over a reference grid. For consistency the same transformation is applied to the exact solution yExact. Figure 8 illustrates the two approaches to define solution error. Option-I, on the left, is the pointwise definition of the error where the exact solution is evaluated at the center of each cell of the computational domain. It is our contention that the Option-I definition of error is inconsistent with how a finite volume method calculates a discrete solution because it compares cell-averaged values for y(x) to point-wise values for yExact. In addition Option-I produces solution error vectors that depend on the size of the grid used to compute y(x). Option-II, on the right, consists of averaging both exact and discrete solutions over the cells of a reference grid. Differences are then calculated between these averages and accumulated to obtain the L1, L2, or L norms of solution error (x). Option-II is completely consistent because exact and discrete solutions are averaged over the same volumes (3D), areas (2D) or lengths (1D) prior to calculating (x). Furthermore the volumes, areas, or lengths of the reference grid are independent of grids defined for refinement. The results discussed are obtained with a reference grid equal to the coarsest grid of the study, which implies that all error vectors are defined over the same space of R10. The procedure isolates discretization error from other effects that could potentially influence numerical accuracy, such as inconsistencies in initial conditions, because the transformed solutions are independent of the level of grid refinement. It is emphasized that, when discrete solutions are defined over elements or cells, averaging takes the form of a weighted-average for consistency with the laws-of-conservation being solved. Weighting is also required if elements or cells do not have a uniform size. Values ys(x) defined over cells of the computational domain, s = 1 N, are converted into volume-averaged values k(x) for cells of the reference grid:

k (x ) = y
(s) th

s =1 LNk

(s)

. y s (x )

s =1 LNk

V (s)

(9)

where V is the volume of the s original cell and Nk is the number of cells averaged to define the kth value k(x). (Clearly V(s) is a volume for a 3D problem, it is an area for a 2D problem, and a length for a 1D problem.) Because the reference grid is, here, defined as the coarsest grid, weighted-averaging is irrelevant if y(x) R10. For the other discrete solutions that belong to spaces y(x) R40, y(x) R160, y(x) R640, or y(x) R2,560, averaging is performed over 4, 16, 64, or 256 cells, respectively. All weighted-averaged, discrete solutions (x) then belong to the same mathematical space, (x) R10. Figures 9 and 10 illustrate the density errors obtained with Option-I and Option-II, where the discrete solutions are those shown in Figure 6. Plots of Figure 9 clearly indicate that the pointwise defined error vectors of Option-I increase in length as the grid is refined. The largest errors occur at the contact discontinuity and shock front. It can be observed that refining the grid does not significantly improve solution accuracy at the contact or shock. Refining the mesh does, on the other hand, reduce the overall solution error behind the contact and in the rarefaction region. Figure 10 is obtained with the same datasets as those of Figure 9 with the difference that error is defined in terms of the weighted-averaging of Option-II. This explains why error vectors all have the same length in R10. (Recall that the reference grid is, here, equal to the coarsest grid

of the study.) The figure emphasizes that, due to averaging, the error is constant over each one of the reference cells.

Figure 9. Point-wise differences between exact and discrete solutions (Option-I).

Figure 10. Weighted-averaged differences of exact and discrete solutions (Option-II). It can be observed that Figure 10 depicts a different picture of the effect of mesh refinement, whereby solution accuracy is improved, as x 0, for the entire flow in 0 x 1 cm. The reason is because averaging reduces the influence of a small number of cells that tend to

contribute large errors, for example, at the contact discontinuity and shock front. Refining the mesh never helps in this regards because a few cells, maybe just one, always contribute large errors at the discontinuity irrespective of how small the cells become. This reasoning suggests that defining solution error from point-wise differences, as promoted by Option-I, may paint an erroneous picture of what mesh refinement is doing in terms of improving solution accuracy. Once Option-I or Option-II has been selected to define a solution error vector, Lp norms of the differences between exact and discrete solutions are calculated using definitions:

y Exact y(x ) =
1

1 V
Total

LN s =1

V
.

(s)

. y Exact y s (x ) s
Exact s

y Exact y(x ) =
2

1 V
Total

Exact

y(x )

LN s =1

V .(y
(s)

y s (x )

(10)

= max y
LN s =1

Exact s

y s (x )

where ysExact denotes the sth value of the (original or weighted-averaged) exact solution, ys(x) is the sth value of the (original or weighted-averaged) discrete solution, and V(s) represents the length (1D), area (2D), or volume (3D) of the sth (original or reference) element or cell. Symbol VTotal is the total volume of the computational domain, equal to the accumulation of volumes V(s) for all elements or cells, s = 1 N. Definitions are given in equation (10) for the L1, L2, and L norms, from top to bottom.

6. ANALYSIS OF ASYMPTOTIC CONVERGENCE FOR THE SOD 1D PROBLEM


In what follows, asymptotic convergence is analyzed in the sense of the L2 norm of solution error. Results obtained with the intuitive definition of solution error (Option-I) are compared to those obtained with the weighted-averaged solution error (Option-II). The goal of the analysis is to verify that the rate-of-convergence observed through mesh refinement for the Sod problem matches the theoretical, first-order accuracy of the Godunov finite volume method when the scheme is applied to discontinuous or shocked solutions.

Figure 11. Solution errors ||yExact y(x)||2 calculated with Option-II vs. size x.

Figure 11 shows the L2 norms of density field errors ||yExact y(x)||2 calculated with Option-II versus cell size x, where both exact and discrete solutions are averaged on the coarsest grid. A total of 340 runs are illustrated. They correspond to analyzing the Sod test problem with the spatial discretization x and numerical options defined in Table 2. (These runs have in common that the interface treatment is turned off, or INTERFACE_OPTION = 0.) It can be observed that solution errors cluster in two separate groups, which is emphasized in Figure 11 by using dot symbols of different colors. The red dots are solution errors obtained from runs performed without any flux limiter, or NUMRHO = 0. The blue dots are solution errors obtained from runs performed with one of the three flux limiting options, that is, NUMRHO = 1, 2, or 4. The discrete solutions calculated without flux limiting are systematically less accurate because omitting the limiter produces solutions that are too diffusive, as illustrated next.

Figure 12. Influence of flux limiting with solutions obtained at x = 15.625 microns. Figure 12 compares a detail of the density field in the region 0.64 cm x 0.71 cm, which is just at the contact discontinuity, when the flux limiter option is varied. All other settings of these four runs are identical, with CFL = 0.9 and x = 15.625 microns (640 cells). The figure confirms the influence that the slope limiter has on the numerical quality of solutions and that excessively diffusive solutions are calculated when the limiter is turned off. Because the numerical option NUMRHO = 0 clearly produces inaccurate solutions, it makes no sense to analyze simulation runs performed this way. Runs performed without flux limiting are removed from the datasets, meaning that the discrete solutions shown with red dots in Figure 11 are thrown out. What remains for analysis consists of 240 simulation runs, as specified in Table 2, where the interface treatment is turned off (INTERFACE_OPTION = 0) and flux limiting is turned on with one of the three options NUMRHO = 1, 2, or 4. Figures 13 and 14 show how the L2 norm of solution error ||yExact y(x)||2 decreases as the grid is refined, x 0. The results of Figure 13 are obtained when error is defined as the point-wise difference between values of the exact and discrete solutions (Option-I). Figure 14 illustrates the case of defining error as the difference between values of the exact and discrete solutions averaged over the coarsest mesh (Option-II).

Figure 13. Asymptotic convergence of Option-I errors ||yExact y(x)||2 (with 240 runs). Two observations are made by comparing Figures 13 and 14. First it is clear that the overall rate-of-convergence changes significantly when going from Option-I to Option-II. Both figures seem to indicate that we are in the presence of asymptotic convergence, where truncation error dominates the behavior of numerical accuracy as the grid is refined. But the observed rates-ofconvergence disagree. (The observed rates are calculated through numerical optimization of the two-parameter equation ||yExact y(x)||2 = .(xp) where 240 values of solution error calculated with either Option-I or Option-II are fed to the optimization solver.)

Figure 14. Asymptotic convergence of Option-II errors ||yExact y(x)||2 (with 240 runs).

The Option-I definition of error in Figure 13 gives p = 0.48 while the Option-II definition of error in Figure 14 gives p = 0.94, which is very close to the theoretical first-order accuracy expected for the Sod shock tube problem. Without paying attention to the definition of solution error, one may conclude from Figure 13 that the numerical method is not performing as advertised or, worse, that there is a programming error in the code. It is our contention that not observing the expected rate-of-convergence comes, instead, from a fundamental inconsistency between how Option-I defines the error and how a finite volume method, such as the one investigated here, calculates the solution fields. Solution error cannot be defined in a point-wise sense when the numerical method defines and calculates volume-averaged quantities. Likewise the shape functions of a finite element model introduce interpolation properties that should not be ignored when calculating solution error over a computational domain. Pushing the reasoning further, one could even question whether it makes sense to define solution error in terms of point-wise differences when the numerical method, such as finite volume, finite element, and some finite difference schemes, calculates a weak solution of the equations of motions or laws-of-conservation. It is because weak solution fields are always defined in the sense of a global norm [23-24]. Expecting that quantities defined locally or point-wise will exhibit asymptotic convergence, therefore, seems inconsistent with how weak solutions are defined and calculated. These results illustrate the importance of carefully defining solution error. Another observation made from Figures 13 and 14 is that numerical options exercised in the design of computer experiments seem to have little-to-no influence on solution accuracy. Such statement originates from observing that the value of solution error ||yExact y(x)||2 does not vary at x = constant, even though 48 runs are performed by changing options DTFORCE, CSTAB, and NUMRHO. Note that this assessment is simply visual. The statistical analyses discussed in section 7 are useful to rigorously quantify the influence of numerical options on solution accuracy.

7. ANALYSIS OF THE INFLUENCE ON ACCURACY OF NUMERICAL OPTIONS


In this section we propose to apply statistical techniques of variance decomposition and effect screening to the datasets generated by mesh refinement. These techniques assess how choices made for discretization and numerical options influence solution accuracy. Statistical analysis provides a level of rigor that is not apparent from graphics such as those of Figures 13 and 14. To introduce the concepts of variance decomposition and effect screening, it is convenient to adopt the language of statistical sciences. We start by taking a short detour to introduce these techniques, then, show how they can be applied to our problem. The starting point is to consider a function of N random variables X1 XN, denoted by:

y = F(X 1; X 2 ;L; X N )

(11)

where the function F() is completely arbitrary. In fact this function is not even known in our application: it can only be sampled by running the computer hydro-code. The variables Xk are referred to as input factors and they represent the parameters and options defined in Table 2. The statistical theorem known as decomposition of expectation states that function (11) can be expressed as a summation of conditional expectation functions, written as:

y = y Mean +

k =1LN

F (X
k

)+

p =1LN qp

~
p,q

(X p ; X q ) +

l =1LN ml nm

~
l,m,n

(X l ; X m ; X n ) + L

(12)

where the first term is simply equal to the mean value of the function, that is, yMean = E[y], and the conditional expectation functions are successively defined as:

~ F k (X k ) = E[y | X k ] ~ Fp,q (X p ; X q ) = E[y | (X p ; X q )] ~ Fl,m,n (X l ; X m ; X n ) = E[y | (X l ; X m ; X n )] etc .

(13)

The symbol E[y | Xk] of equation (13) denotes the calculation of a conditional expectation where all variables are known, except the kth variable Xk. (This is analogous to the well-known notation E[y | Xk] where the kth variable Xk is known.) The function E[y | Xk] is defined by integrating equation (11) over all random variables except the kth one. It is, therefore, a function of Xk only. Likewise the function E[y | (Xp; Xq)] is defined by integrating equation (11) over all random variables except the pth and qth ones. It is, therefore, a function of the pair of variables (Xp; Xq). The same logic generalizes to higher-order conditional expectation functions. The decomposition (12-13) explains how the function F() changes when its inputs X1 XN are varied simultaneously. To understand which input(s) Xk or combination(s) of inputs control the spread of values y = F(X1; ; XN) obtained when these N factors vary simultaneously, the variance of function F() can be decomposed in a similar way. This is the basic concept that underlies the principle of variance decomposition. The total variance Y2 of predictions y can be decomposed, and this is also a theorem, according to:

2 Y =

k =1 LN

2 k

p =1LN q>p

2 p,q

l=1 LN ml nm

2 l,m,n

+L

(14)

where the conditional variance functions are successively defined as:


2 2 Y = Var [y ] = E (y y Mean ) ~ 2 Vk2 = Var Fk (X k ) = E (E[y | X k ]) ~ 2 2 Vp,q = Var Fp,q (X p ; X q ) = E (E[y | (X p ; X q )]) 2 ~ 2 Vl,m,n = Var Fl,m,n (X l ; X m ; X n ) = E (E[y | (X l ; X m ; X n )]) etc .

[ [

] [

] [

] [

(15)

The decomposition (14-15) is the basis for understanding which effects control the variation of function F(). Simple techniques of variance decomposition, such as the Sobol method, perform multiple regressions by keeping constant one input factor Xk at-a-time [25]. A coefficient of correlation is then estimated between the response y and kth input variable Xk. It is equivalent to performing multiple linear regression analyses and, therefore, appears to be most relevant when functions F() are, for the most part, linear, continuous, and monotonic. To avoid these potential limitations, we study what influences solution accuracy with an Analysis-of-Variance (ANOVA). ANOVA is based on the same concept of variance decomposition with the difference that the approach is not necessarily performed one factor at-a-time. Suppose that the factors X1 XN of function (11) are partitioned in two subsets denoted by {XKnown} and {XUnknown}. The subsets can be restricted to a single variable Xk or include multiple variables or combinations of variables. The subsets are further defined with the premise that what is not included in {XKnown} goes to {XUnknown}, and vice-versa. Assume next that only the values of subset {XUnknown} vary because they are unknown. The question of understanding what controls the variability of function F() becomes: how does knowing {XKnown} reduce the overall variability?

The answer is obtained by decomposing, as before, the total variance of function F() into two contributions from subsets {XKnown} and {XUnknown}. The main result is given below and it warrants some explanation:
2 2 X Known }]) + E 2 (y | {X Unknown }) Y = (E[y | {

(16)

where the symbols E[y | ] and 2(y | ) denote the conditional expectation and variance, respectively. Clearly the left-hand side is the total variance of function F(). When the function is not known analytically and can only be sampled, which is the situation we are in when running a computer code, the variance is estimated from the population of runs available. The first term of the right-hand side is the variance of the conditional expectation function E[y | {XKnown}] for which the subset of variables {XKnown} is known. Obviously if functions y and E[y | {XKnown}] both exhibit the same variability, that is, Y2 2(E[y | {XKnown}]), then knowing the factors {XKnown} does not help to reduce the total variability. In this case the factors that compose subset {XKnown} may as well be kept constant and equal to their nominal values. Equation (16) implies that the second term of the right-hand side is an importance factor that measures the influence of subset {XKnown} in terms of controlling the overall variability of function F() when N input factors vary simultaneously. If the importance factor is small, the conclusion is then that prescribing or ignoring the subset {XKnown} makes no difference to control, or reduce, how the function varies. Viewed under a different angle, the question is to identify those subsets {XKnown} associated to the largest values of the importance factor. Instead of using a difference between total variance Y2 and partial variance 2(E[y | {XKnown}]) to define the influence factor, it is more convenient to calculate a ratio between the two. A ratio naturally defines a scaling relative to total variance and offers the advantage that the influence of different subsets {XKnown} can be compared directly. A correlation ratio 2 cannot, however, be calculated explicitly if the function is accessible only through sampling. The analysis consists of propagating a design of computer experiments to obtain values for y given combinations of input factors (X1; ; XN). Correlation ratios 2 are then estimated by R2 statistics, as outlined next. An ANOVA estimates correlation ratios 2 = 2(E[y | {XKnown}])/Y2 for different subsets {XKnown} and ranks them according to their relative contributions to total variance Y2. The most influential factors, combinations of factors, or effects are those that control the overall variability of function F() more so than others. Table 4. Definition of input factors for the statistical analysis of three Sod problems. Factor 1 2 3 4 5 Symbol ID x CFL s IP RAGE Command PNAME DXSET CSTAB NUMRHO INTERFACE_OPTION Definition Test problem identifier (unitless) Cell size (units = cm) Courant number (unitless) Flux limiter (unitless) Interface treatment (unitless)

(Legend: the control parameters not included in the table are kept constant and equal to the following values: DTFORCE = turned off, TSTAB = 0.2, HYDRO_VERSION = 2, HYDROBET = 0.25.)

The ANOVA procedure is applied to screen the discretization and numerical settings defined in Table 4 for three variants of the Sod shock tube problem. The first problem is the original Sod problem defined in Figure 3 and Table 3, and analyzed in sections 5 and 6. The second one is similar, except that the discontinuous initial condition is very severe. Values of the density and pressure jump by a factor of 10+3 at the interface between the two half-domains. It tests the ability of the algorithm to handle numerical ill-conditioning and post-shock oscillations. The third Sod problem is, again, similar to the original version, except that the left and right regions are

initialized with two different materials. It allows testing of the interface treatment that handles multi-material mixing within the cells. The design of computer experiments is full-factorial, meaning that runs of the hydro-code are performed for all combinations of settings defined in Table 4. Runs performed when flux limiting is suppressed, that is, with option NUMRHO = 0, are included in the ANOVA even though the discrete solutions produced are too diffusive. This is done on purpose to find out the extent to which the flux limiter influences the variability of solution accuracy. The full-factorial design gives a total number of runs equal to 3 (ID) x 5 (x) x 4 (CFL) x 4 (s) x 2 (IP) = 480 runs. The datasets fed to the ANOVA consist of 480 values of L2 norm of solution error ||yExact y(x)||2 where differences between the exact and discrete solutions are defined in an averaged sense over the coarsest grid (Option-II of sections 5 and 6). The simplest possible definition of subsets {XKnown} is to investigate the influence of individual factors Xk, one at-a-time. It is referred to as main effect screening. The five main effects are defined from Table 4 as {XKnown} = ID, x, CFL, s, and IP. The influence that, for example, cell size x exercises on the value of solution error ||yExact y(x)||2 is estimated by the R2 statistic calculated for the x-only main effect:
(k) k =1L NLevel j=1L NData

(y
j

(k) j

y (k)
2

R2 = 1

j=1L NData

(y y )

(17)

where symbols yj and yj(k) denote the solution error; NData is the total number of samples in the population, NData = 480, here; NLevels is the number of levels for the effect considered, NLevels = 5 because five grid sizes are analyzed; and N(k)Data is the number of y-values available at each level (N(k)Data is constant, because the design is full-factorial, and equal to 480/5 = 96, here). The denominator of equation (17) estimates the total variance of the population, Y2, modulo scaling by the population size (NData1). The inner summation of the numerator estimates the conditional variance 2(y | {XUnknown}) where solution errors yj(k) are restricted to runs performed at the kth level of grid size, that is, x = constant = 0.1 cm, 25 mm, 6.25 mm, 1.5625 mm, or 390.625 microns. The summation represents the discrete implementation of integration over the subset {XUnknown} of unknown factors. The outer summation of the numerator is the expectation, or average, of the five conditional variances where each one corresponds to a different level of cell size, x = constant. A large value of the R2 statistic for the x-only main effect, relative to other main effects, would indicate that varying the cell size influences the variability of solution accuracy more so than varying the other factors one at-a-time. Table 5. Main effect R2 statistics obtained for three Sod problems (ANOVA). Input Factor ID x CFL s IP Total Density R2 Statistics (%) 11.82% 87.48% 0.04% 0.66% 0.00% 100.00% Pressure R2 Statistics (%) 17.93% 81.00% 0.02% 1.05% 0.00% 100.00% Velocity R2 Statistics (%) 4.48% 95.45% 0.03% 0.04% 0.00% 100.00% SIE R2 Statistics (%) 10.66% 87.81% 0.05% 1.48% 0.00% 100.00%

(Legend: values of the main effect, R2 statistics listed in the table are scaled to add up to 100%.)

The results of main effect ANOVA are tabulated in Table 5 and shown graphically in Figure 15 for the density, pressure, flow velocity, and Specific Internal Energy (SIE) solution fields. Values are scaled to add to 100%, which facilitates their comparison. Two observations are made. First discretization size is, as expected, the most significant main effect of the five factors considered. As discussed previously this is clearly visible from Figure 14 of section 6. The ANOVA reaches the same conclusion with the benefit of adding rigor to the analysis.

Figure 15. Main effect ANOVA of solution accuracy for three Sod problems. The second observation relates to the R2 statistics obtained for factors ID (the test problem identifier) and IP (the interface treatment). It may be argued that it makes no sense to analyze factor ID because different test problems should not be compared with each other. Likewise why include factor IP, since the interface treatment is irrelevant for a single-material problem? Factors ID and IP are included anyway to provide a sanity check of the results provided by the ANOVA. Because the test problem is expected to influence solution accuracy, the R2 statistic obtained for the main effect ID can be used to establish a floor level above which other effects are, for sure, significant. Likewise checking that the R2 statistic obtained for the main effect IP is small provides confidence that the ANOVA is not biased. Table 5 and Figure 15 show, indeed, that the type of test problem (ID) influences solution accuracy while the interface treatment (IP) is irrelevant. Based on these thresholds, the Courant number (CFL) and flux limiter (s) are deemed to be insignificant in terms of influencing solution accuracy for these Sod test problems. The next step is to assess the joint influence of pairs of factors (Xp; Xq) where p q. This is referred to as linear interaction screening. Based on the five factors of Table 4 one can define up to ten linear interactions and the subsets {XKnown} become (ID; x), (ID; CFL), (ID; s), (ID; IP), , (s; IP). The complete list is given in Table 6 that also tabulates the R2 statistics of linear interactions for the accuracy of density, pressure, flow velocity, and SIE solutions. The R2 statistics are computed as shown in equation (17), the only difference being how to select runs used to estimate the conditional variance 2(y | {XUnknown}) for each linear interaction. In the case, for example, of linear interaction (ID; x) between the problem identifier and cell size, the full-factorial design provides NLevels = 3 x 5 = 15 levels because three Sod problems are analyzed with five grids. Likewise the number of solution error data points available for each

level (ID; x) = constant is equal to N(k)Data = 480/15 = 32, which is the full-factorial combination of 4 levels for CFL (CSTAB), 4 levels for s (NUMRHO), and 2 levels for IP (INTERFACE_OPTION). In accordance to equation (17) these 32 values of solution error ||yExact y(x)||2 are used to estimate the conditional variance 2(y | {XUnknown}) for each one of the 15 levels of linear interaction (ID; x) = constant. The expected (average) conditional variance is then calculated from the 15 values to obtain the R2 statistic. Table 6. Linear interaction R2 statistics obtained for three Sod problems (ANOVA). Input Factor (ID; x) (ID; CFL) (ID; s) (ID; IP) (x; CFL) (x; s) (x; IP) (CFL; s) (CFL; IP) (s; IP) Total Density R2 Statistics (%) 13.49% 6.10% 8.65% 9.52% 11.70% 11.95% 12.43% 8.01% 9.06% 9.09% 100.00% Pressure R2 Statistics (%) 13.74% 6.34% 9.02% 9.81% 11.15% 11.38% 12.06% 8.12% 9.17% 9.21% 100.00% Velocity R2 Statistics (%) 13.25% 5.62% 8.56% 9.04% 12.32% 12.54% 12.66% 8.29% 8.86% 8.86% 100.00% SIE R2 Statistics (%) 13.61% 5.86% 8.72% 9.49% 11.62% 11.91% 12.35% 8.20% 9.09% 9.15% 100.00%

(Legend: values of the linear interaction, R2 statistics listed in the table are scaled to add up to 100%.)

It can be seen that the definition of subsets {XKnown} can then grow in complexity to include quadratic effects Xp2, such as ID2, x2, CFL2, s2, and IP2, or higher-order effects. The only practical difficulty is to work with a design of computer experiments that provides access to enough runs that the conditional variances 2(y | {XUnknown}) can be estimated accurately for the higher-order effects considered. The results of ANOVA summarized in Table 6 do not point to any significant linear interaction. It is observed that linear interactions that involve the problem identifier, such as (ID; CFL), (ID; s), or (ID; IP), lead to R2 statistics in the range of 6-to-10%. Because the problem identifier has no physical or numerical reason to interact with these other factors to control solution accuracy, it is concluded that R2 6-to-10% is the floor level below which an interaction is not significant. Somewhat larger values are obtained when the cell size x interacts with other factors, but this is believed to be an artifact of the overwhelming predominance of the x-only main effect. The high-level conclusion from the analysis of asymptotic convergence in section 6 and ANOVA in section 7 is that the Godunov method implemented in the hydro-code, not only, behaves as expected in the presence of shocked solutions (because it is first-order accurate), but also that its numerical accuracy is robust to the user-selection of numerical options. This is generally considered good news because it means that, for example, problems can be analyzed with large Courant stability numbers, which speeds-up the turn-around time.

8. CONCLUSION
In this work it is suggested to use statistical analysis-of-variance to quantify the influence that numerical options of computer codes may have on solution accuracy. This question applies to codes that solve partial differential equations by discretizing a computational domain, such as finite element methods in solid mechanics or finite volume codes in fluid dynamics. Asymptotic

convergence that defines solution accuracy is first studied through mesh refinement. Statistical analysis is then proposed to quantify whether the algorithms of the code are robust to numerical settings such as the time step control, gradient estimation, or amount of artificial dissipation. The method is illustrated with an assessment of solution accuracy for six test problems in hydrodynamics. The code investigated is a second-order, Godunov finite volume method that solves the equations of compressible gas dynamics in an Eulerian frame-of-reference. This publication focuses on the analysis of three variants of the Sod shock tube problem for which an analytical, exact solution exists. It makes it easier to define accuracy as the Lp norm of differences between the exact and discrete solutions. The first finding is that the manner in which differences between the exact and discrete solutions are defined is crucial to make truncation error behave as expected in the regime of asymptotic convergence. Sub-optimal rates-of-convergence are observed when solution differences are not defined consistently with how the numerical method constructs its discrete solutions. Analysis of the Godunov finite volume method indicates the expected, first-order accuracy in the presence of shocks when solution error is defined in a volume-averaged sense. The second finding is that the hydro-dynamics method offers an inherent level of robustness to numerical options, such as Courant condition, time step control, type of flux limiter, and amount of artificial viscosity, at least in the case of the Sod shock tube problem investigated here. Statistical analysis demonstrates that the overall variance of solution accuracy is not controlled by these numerical options. Advantages of the method proposed are its rigor and ability to investigate simultaneously many numerical options. The main limitation resides in the rapid growth of the number of runs needed to analyze a full-factorial design of computer experiments. Future work will address this issue by using other designs of experiments, such as orthogonal arrays, to assess whether paying the price of aliasing is worth the benefit of a reduced number of runs. Another challenge is to analyze large-scale simulations that do not have exact solutions. The tasks of defining solution accuracy and assessing asymptotic convergence are not as straightforward when the exact solution is an unknown field. An approach that appears promising would be to combine the statistical decomposition of variance to the method proposed in Reference [26], where discrete solutions are expressed as linear combinations of empirical modes and solution verification is performed one mode at-a-time without needing to know what the exact solution is equal to.

ACKNOWLEDGMENTS
This work is performed under the auspices of the Advanced Scientific Computing verification project at the Los Alamos National Laboratory (LANL). The authors are grateful to Jerry Brock, project leader, for his support and technical leadership. LANL is operated by the Los Alamos National Security, LLC for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396.

REFERENCES
[1] Hemez, F.M., Doebling, S.W., Anderson, M.C., A Brief Tutorial on Verification and Validation, 22nd SEM International Modal Analysis Conference, Dearborn, Michigan, January 26-29, 2004. Brock, J.S., ASC Level-2 Milestone Plan: Code Verification, Calculation Verification, Solution-error Analysis, and Test-problem Development for LANL Physics-simulation Codes, Technical Report of the ASC Code Verification Project, Los Alamos National Laboratory, Los Alamos, New Mexico, May 2005. LA-UR-05-4212.

[2]

[3]

Marcilhac, M., Hemez, F.M., Analysis of a Large Dataset for the Verification of a Hydrodynamics Code, 9th U.S. National Congress on Computational Mechanics, San Francisco, California, July 23-26, 2007. LA-UR-07-4221. Godunov, S.K., Reminiscences about Difference Schemes, Journal of Computational Physics, Vol. 153, 1999, pp. 6-25. Hemez, F.M., Non-linear Error Ansatz Models for Solution Verification in Computational Physics, Technical Report of the ASC Code Verification Project, Los Alamos National Laboratory, Los Alamos, New Mexico, October 2005. LA-UR-05-8228. Hemez, F.M., Brock, J.S., Kamm, J.R., Non-linear Error Ansatz Models in Space and Time for Solution Verification, 1st Non-deterministic Approaches (NDA) Conference and 47th AIAA/ASME/ ASCE/AHS/ASC Structures, Structural Dynamics, and Materials (SDM) Conference, Newport, Rhode Island, May 1-4, 2006. LA-UR-06-3705. Clover, M., The New Hydro-dynamic Method in RAGE, Technical Report of the ASC Crestone Code Project, Los Alamos National Laboratory, Los Alamos, New Mexico, 2002. LA-UR-02-3802. RAGE Users Manual, Technical Report of the ASC Crestone Code Project, Los Alamos National Laboratory, Los Alamos, New Mexico, 2004. LA-CP-04-0423. SAGE Users Manual, Technical Report of the ASC Crestone Code Project, Los Alamos National Laboratory, Los Alamos, New Mexico, 2004. LA-UR-04-2959.

[4] [5]

[6]

[7]

[8] [9]

[10] Gittings, M., Weaver, R., Clover, M., Betlach, T., Byrne, N., Coker, R., Dendy, E., Hueckstaedt, R., New, K., Oakes, W.R., Ranta, D., Stefan, R., The RAGE Radiationhydro-dynamic Code, Technical Report of the ASC Crestone Code Project, Los Alamos National Laboratory, Los Alamos, New Mexico, 2006. LA-UR-06-0027. [11] Gisler, G., Weaver, R., Mader, C., Gittings, M., Three-dimensional Simulations of Asteroid Impacts, Colloquium on Wave Heights, Atmospheric Effects, and Thermal Radiation From Impact-generated Plumes, University of California Santa Cruz, Santa Cruz, California, January 31, 2003. LA-UR-02-1453. [12] Noh, W.F., Errors for Calculations of Strong Shocks Using an Artificial Viscosity and an Artificial Heat-flux, Journal of Computational Physics, Vol. 72, No. 1, 1987, pp. 78-120. [13] Rider, W.J., Revisiting Wall Heating, Journal of Computational Physics, Vol. 162, No. 2, 2000, pp. 395-410. [14] Gisler, G., Two-dimensional Convergence Study of the Noh and Sedov Problems with RAGE: Uniform an Adaptive Grids, Technical Report of the ASC Code Verification Project, Los Alamos National Laboratory, Los Alamos, New Mexico, October 2005. [15] Kamm, J.R., Rider, W.J., Brock, J.S., Combined Space and Time Convergence Analyses of a Compressible Flow Algorithm, 16th AIAA Computational Fluid Dynamics Conference, Orlando, Florida, July 2003. LA-UR-03-2628. [16] Greenough, J.A., Rider, W.J., A Quantitative Comparison of Numerical Methods for the Compressible Euler Equations: Fifth-order WENO and Piecewise-linear Godunov, Journal of Computational Physics, Vol. 196, 2004, pp. 259-281. [17] Rider, W.J., A Review of Approximate Riemann Solvers with Godunov's Method in Lagrangian Coordinates, Journal of Computers and Fluids, Vol. 23, No. 2, 1994, pp. 397-413.

[18] Rider, W.J., Drikakis, D., High-resolution Methods for Incompressible and Lowspeed Flows, Springer, New York, 2005. [19] Roache, P.J., Verification in Computational Science and Engineering, Hermosa Publishers, Albuquerque, New Mexico, 1998. [20] Stern, F., Wilson, R., Shao, J., Quantitative V&V of Computational Fluid Dynamics (CFD) Simulations and Certification of CFD Codes with Examples, 2004 ICHMT International Symposium on Advances in Computational Heat Transfer, Norway, April 1924, 2004. [21] Richtmyer, R.D., Proposed Numerical Method for Calculation of Shocks, Technical Report LA-671, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, March 1948. [22] Sod, G.A., A Survey of Several Finite Difference Methods for Systems of Non-linear Hyperbolic Conservation Laws, Journal of Computational Physics, Vol. 27, 1978, pp. 131. [23] LeVeque, R.J., Numerical Methods for Conservation Laws, Birkhauser-Verlag, 1990. [24] LeVeque, R.J., Finite Volume Methods for Hyperbolic Problems, Cambridge University Press, 2002. [25] Saltelli, A., Chan, K., Scott, M., Sensitivity Analysis, John Wiley & Sons Editors, 2000. [26] Hemez, F.M., Functional Data Analysis of Solution Convergence, Technical Report of the ASC Code Verification Project, Los Alamos National Laboratory, Los Alamos, New Mexico, August 2007. LA-UR-07-5758.

You might also like