From 629aa51a1f2e890ee4ffd739991ee9a566fd28cb Mon Sep 17 00:00:00 2001
From: Tilman Metz
Date: Mon, 22 Feb 2016 18:01:20 +0100
Subject: [PATCH] Several changes in documentaion
section grid dispersion had mistakes in it.
removed outdated parts.

doc/guide_sofi3D/tex/guide_sofi3D.tex  240 +++++++++++
doc/guide_sofi3D/tex/thesis.bib  18 ++
2 files changed, 117 insertions(+), 141 deletions()
diff git a/doc/guide_sofi3D/tex/guide_sofi3D.tex b/doc/guide_sofi3D/tex/guide_sofi3D.tex
index a9bcd4c..a8ca7cc 100644
 a/doc/guide_sofi3D/tex/guide_sofi3D.tex
+++ b/doc/guide_sofi3D/tex/guide_sofi3D.tex
@@ 94,19 +94,15 @@ seismic modeling with finite differences \\
\section{Requirements}
\label{requirements}
SOFI3D can be generally run under Windows and Linux. You just have to make sure that an MPI implementation (one of the three listed below) is running and a CCompiler is available. However, the preferred platform is Linux, since most of the large scale cluster computer run under Linux. On our local workstation cluster we use Suse Linux and OpenMPI, most of the test are performed under this system. The following programs should be installed on you machine for a proper run and processing of modeling data.
+SOFI3D can be generally run under Windows and Linux. You just have to make sure that an MPI implementation is running and a CCompiler is available. However, the preferred platform is Linux, since most of the large scale cluster computer run under Linux. On our local workstation cluster we use Suse Linux and OpenMPI, most of the test are performed under this system. The following programs should be installed on you machine for a proper run and processing of modeling data.
\begin{center}
% use packages: array
\begin{small}
\begin{tabular}{lll}
Program & Description & Weblink \\
OpenMPI & MPI Implementation & \tiny{\url{http://www.mcs.anl.gov/research/projects/mpich2}} \\
+OpenMPI & MPI Implementation & \url{http://www.openmpi.org} \\
& (for the parallelization) & \\
MPICH2 & MPI Implementation & \url{http://www.openmpi.org} \\
& (for the parallelization) & \\
LAM/MPI & MPI Implementation & \url{http://www.lammpi.org} \\
& (for the parallelization) & \\
CCompiler & the whole code is written in C,& \\
& any CCompiler should be able & \\
& to compile it & \\
@@ 145,7 +141,7 @@ To run the program on 8 CPUs do
You may also use the shell script \lstinline{startSOFI3D.sh} located in \lstinline{sofi3D/par} that also writes the screen output to the file \lstinline{sofi3D/par/in_and_out/sofi3D.jout}.
The modeling parameters are specified in the sofi3D input file \lstinline{./in_and_out/sofi3D.json}. The old input file format (*.inp) is not supported anymore in the latest release. The parameters should be more or less selfexplanatory. The synthetic seismograms and snapshots are written to the files specified in the sofi3D input file.
+The modeling parameters are specified in the input file \lstinline{./in_and_out/sofi3D.json}. The old input file format (*.inp) is not supported anymore in the latest release. The parameters should be more or less selfexplanatory. The synthetic seismograms and snapshots are written to the files specified in the sofi3D input file.
In the current distribution, the model is generated on the fly by the function sofi3D/src/hh.c. This function generates a homogeneous medium with $v_p$=3500 m/s, $v_s$=2000 m/s, and $\rho$=2000 kg/$\mathrm{m}^3$. Instead, the function readmod.c can be used to read model info from external grid files. See readmod.c for the format of the external files. You can change the function which generates the model grid by switching the READMOD parameter in the sofi3D input file.
@@ 490,36 +486,41 @@ A comparison between the exponential damping and the PML boundary is shown in se
To avoid numerical artefacts and instabilities during a FD modeling run, a spatial and temporal sampling condition for the wavefield has to be satisfied. These will be discussed in the following two sections.
\subsection{Grid Dispersion}\label{griddispersion}
The first question when building a FD model is: What is the maximum spatial grid point distance dh, so that the wavefield is correctly sampled? To answer this question we take a look at this simple example: The particle velocity in xdirection is defined by a sine function:

\EQ{grid_disp:1}{v_x=\sin\biggl(2 \pi \frac{x}{\lambda}\biggr),}

where $\lambda$ denotes the wavelength. When calculating the derivation of this function analytically at x=0 and setting $\lambda=1\;m$ we get:

\EQ{grid_disp:2}{\frac{d v_x}{d x}\biggl_{x=0}=\frac{2 \pi}{\lambda} \cos\biggl(2 \pi \frac{x}{\lambda}\biggr)\biggl_{x=0}=2 \pi.}

In the next step, the derivation is approximated numerically by a 2nd order finitedifference operator:

\EQ{grid_disp:3}{\frac{d v_x}{d x}\biggl_{x=0} \approx \frac{v_x(x+\Delta x)v_x(x)}{\Delta x}\biggl_{x=0}=\frac{\sin \biggl(\frac{2 \pi \Delta
x}{\lambda} \biggr)}{\Delta x}.}

Using the Nyquistcriteria it should be sufficient to sample the wavefield with $\Delta x = \lambda/2$. In table \ref{grid_disp.1}, the numerical solutions of Equation \ER{grid_disp:3} and the analytical solution \ER{grid_disp:2} are compared for different sample intervals $\Delta x = \lambda /n$, where n is the number of gridpoints per wavelength. For the case n=2, which corresponds to the Nyquist criteria, the numerical solution is $\frac{d v_x}{d x}_{x=0}=0.0$, which is not equal with the analytical solution $2 \pi$. A refinement of the spatial sampling of the wavefield results in an improvement of the finite difference solution. For n=16 the numerical solution is accurate to one decimal place. The effect of a sparsly sampled pressure field is illustrated in \FIG{grid_disp_pics}. The dimensions of the FD grid are fixed and the central frequency of the source signal is increased systematically. When using a spatial sampling of 16 grid points per minimum wavelength (\FIG{grid_disp_pics}, top) the wavefronts are sharply defined. For n=4 grid points a slight numerical dispersion of the wave occurs (\FIG{grid_disp_pics}, center). This effect is obvious when using the Nyquist criteria (n=2) (\FIG{grid_disp_pics}, bottom). Since the numerical calculated wavefield seem to be dispersive this numerical artefact is called {\bf{grid dispersion}}. To avoid the occurrence of grid dispersion the following criteria for the spatial grid spacing dh has to be satisfied:

\EQ{grid_disp:4}{dh \le \frac{\lambda_{min}}{n} = \frac{V_{s,min}}{n\; f_{max}}.}

Here $\lambda_{min}$ denotes the minimum wavelength, $V_{s,min}$ the minimum Swave velocity in the model and $f_{max}$ is the maximum frequency of the source signal.Depending on the accuracy of the used FD operator the parameter n is different. In table \ref{grid_disp.2}, n is listed for different FD operator lengths and types (Taylor and Holberg operators). For short operators n should be chosen relatively large, so the spatial grid spacing is small, while for longer FD operators n is smaller and the grid spacing can be larger.

+The first question when building a FD model is: What is the maximum spatial grid point distance dh, for a correct sampling of the wavefield? To answer this question we take a look at this simple example: The particle displacement in xdirection is defined by a sine function:
+\EQ{grid_disp:1}{v_x=\sin\biggl(2 \pi \frac{x}{\rm{\lambda}}\biggr),}
+where $\rm{\lambda}$ denotes the wavelength. When calculating the derivation of this function analytically at $x=0$ and setting $\rm{\lambda}=1\;\rm{m}$ we get:
+\EQ{grid_disp:2}{\frac{d v_x}{d x}\biggl_{x=0}=\frac{2 \pi}{\rm{\lambda}} \cos\biggl(2 \pi \frac{x}{\rm{\lambda}}\biggr)\biggl_{x=0}=2 \pi.}
+In the next step the derivation is approximated numerically by a staggered 2nd order finitedifference operator:
+\EQ{grid_disp:3}{\frac{d v_x}{d x}\biggl_{x=0} \approx \frac{v_x(x+\frac{1}{2}\Delta x)v_x(x\frac{1}{2}\Delta x)}{\Delta x}\biggl_{x=0}=\frac{\sin \biggl(\frac{2 \pi (\frac{1}{2}{\rm dx})}{\rm{\lambda}} \biggr)\sin \biggl(\frac{2 \pi (\frac{1}{2}{\rm dx})}{\rm{\lambda}} \biggr)}{\Delta x}.}
+Using the NyquistShannon sampling theorem it should be sufficient to sample the wavefield with $\Delta x = \rm{\lambda}/2$. In table \ref{grid_disp.1} the
+numerical solutions of eq. \ER{grid_disp:3} and the analytical solution \ER{grid_disp:2} are compared for different sample intervals
+$\Delta x = \rm{\lambda} /n$, where n is the number of gridpoints per wavelength. For the case n=2, which corresponds to the NyquistShannon theorem, the
+numerical solution is $\frac{d v_x}{d x}_{x=0}=4.0$, which is not equal with the analytical solution $2 \pi$. A refinement of the spatial
+sampling of the wavefield results in an improvement of the finite difference solution. For $\rm n=16$ the numerical solution is accurate to the second
+decimal place. The effect of a sparsly sampled pressure field is illustrated in \FIG{grid_disp_pics} for a homogeneous block model with stress free surfaces. The dimensions of the FD grid are fixed
+and the central frequency of the source signal is increased systematically.
+When using a spatial sampling of 16 grid points per minimum wavelength (\FIG{grid_disp_pics}, top) the wavefronts are sharply defined. For $\rm n=4$
+grid points a slight numerical dispersion of the wave occurs (\FIG{grid_disp_pics}, center). This effect is obvious when using the Nyquist criterion ($\rm n=2$)
+(\FIG{grid_disp_pics}, bottom). Since the numerical calculated wavefield seem to be dispersive this numerical artefact is called {\bf{grid dispersion}}.
+To avoid the occurence of grid dispersion the following criteria for the spatial grid spacing dh has to be satisfied:
+\EQ{grid_disp:4}{{\rm dh \le \frac{\lambda_{min}}{n}} = \frac{{\rm V_{min}}}{{\rm n}\; f_{max}}.}
+Here $\rm{\lambda_{min}}$ denotes the minimum wavelength, $\rm{V_{min}}$ the minimum velocity in the model and $f_{max}$ is the maximum
+frequency of the source signal.
+Depending on the accuracy of the used FD operator the parameter n is different. In table \ref{grid_disp.2} n is listed for different FD operator lengths
+and types (Taylor and Holberg operators). The Holberg coefficients are calculated for a minimum dispersion error of $0.1\%$ at $3 f_{max}$. For short operators n should be choosen relatively large, so the spatial grid spacing is small, while for longer FD operators n is smaller and the grid spacing can be larger.
\begin{table}[hbt]
\begin{center}
\begin{tabular}{ccc}\hline \hline
n & $\Delta x\; [m]$ & $\frac{d v_x}{d x}_{x=0}\; []$ \\ \hline
analytical &  & $2\pi \approx 6.283$ \\
2 & $\lambda/2$ & 0.0 \\
4 & $\lambda/4$ & 4.0 \\
8 & $\lambda/8$ & 5.657 \\
16 & $\lambda/16$ & 6.123 \\ \hline \hline
+2 & $\rm{\lambda}/2$ & 4.0 \\
+4 & $\rm{\lambda}/4$ & 5.657 \\
+8 & $\rm{\lambda}/8$ & 6.123 \\
+16 & $\rm{\lambda}/16$ & 6.2429 \\
+32 & $\rm{\lambda}/32$ & 6.2731\\ \hline \hline
\end{tabular}
\caption{\label{grid_disp.1} Comparison of the analytical solution of Equation \ER{grid_disp:2} with the numerical solution (Equation \ER{grid_disp:3}) for different grid spacings $\Delta x = \lambda /n$}
+\caption{\label{grid_disp.1} Comparison of the analytical solution Eq. \ER{grid_disp:2} with the numerical solution Eq. \ER{grid_disp:3}
+for different grid spacings $\Delta x = \rm{\lambda /n}$.}
\end{center}
\end{table}
@@ 529,102 +530,79 @@ analytical &  & $2\pi \approx 6.283$ \\
FDORDER & n (Taylor) & n (Holberg) \\ \hline
2nd & 12 & 12 \\
4th & 8 & 8.32 \\
6th & 7 & 4.77 \\
8th & 6 & 3.69 \\
+6th & 6 & 4.77 \\
+8th & 5 & 3.69 \\
10th & 5 & 3.19 \\
12th & 4 & 2.91 \\
+12th & 4 & 2.91 \\
\hline \hline
\end{tabular}
\caption{\label{grid_disp.2} The number of grid points per minimum wavelength n for different orders (2nd12th) and types (Taylor and Holberg) of FD operators.}
+\caption{\label{grid_disp.2} The number of grid points per minimum wavelength n for different orders (2nd12th) and types (Taylor and
+Holberg) of FD operators. For the Holberg coefficients n is calculated for a minimum dispersion error of $0.1\%$ at $3 f_{max}$.}
\end{center}
\end{table}

\clearpage

\begin{figure}[ht]
\begin{center}
\epsfig{file=eps/homogenous_grid_n_16_5.pdf, width=7 cm}\epsfig{file=eps/homogenous_grid_n_16_10.pdf, width=7 cm}
\epsfig{file=eps/homogenous_grid_n_4_5.pdf, width=7 cm}\epsfig{file=eps/homogenous_grid_n_4_10.pdf, width=7 cm}
\epsfig{file=eps/homogenous_grid_n_2_5.pdf, width=7 cm}\epsfig{file=eps/homogenous_grid_n_2_10.pdf, width=7 cm}
\caption{\label{grid_disp_pics} The influence of grid dispersion in FD modeling: Spatial sampling of the wavefield using 16 (top), 4 (center) and 2 gridpoints (bottom).}
+\epsfig{file=eps/homogenous_grid_n_16_5.pdf, width=6 cm}\epsfig{file=eps/homogenous_grid_n_16_10.pdf, width=6 cm}
+\epsfig{file=eps/homogenous_grid_n_4_5.pdf, width=6 cm}\epsfig{file=eps/homogenous_grid_n_4_10.pdf, width=6 cm}
+\epsfig{file=eps/homogenous_grid_n_2_5.pdf, width=6 cm}\epsfig{file=eps/homogenous_grid_n_2_10.pdf, width=6 cm}
+\caption{\label{grid_disp_pics} The influence of grid dispersion in FD modeling: Spatial sampling of the wavefield using n=16 (top), n=4 (center) and n=2 gridpoints (bottom) per minimum wavelength $\rm{\lambda_{min}}$.}
\end{center}
\end{figure}
\clearpage

\subsection{The Courant Instability}\label{courandt}
In analogy to the spatial, the temporal discretization has to satisfy a sampling criterion to ensure the stability of the FD code. If a wave is crossing a discrete grid, then the timestep $dt$ must be less than the time for the wave to travel between two adjacent grid points with grid spacing $dh$. For a 3D grid this means mathematically:

\EQ{courandt:1}{dt \le \frac{dh}{r_{\text{max}} \cdot v_{p,max}}.}

Where $v_{p,max}$ is the maximum Pwave velocity in the model. The factor $r_{\text{max}}$ represents the CourantFriedrichsLewy (CFL) number. In table \ref{courant.1}, r is listed for different FD operator lengths and types (Taylor and Holberg operators). Criterion \ER{courandt:1} is called {\bf{CourantFriedrichsLewy criterion}}. \FIG{courandt_pics} shows the evolution of the pressure field when the Courant
criterion is violated. After a few time steps the amplitudes are growing to infinity and the calculation becomes unstable.\\
At the beginning of the simulation the CFL criterium will be checked automatically.

+\subsection{The Courant Instability}\label{courant}
+Beside the spatial, the temporal discretization has to satisfy a sampling criterion to ensure the stability of the FD code. If a
+wave is propagating on a discrete grid, then the timestep $dt$ has to be less than the time for the wave to travel between two adjacent grid
+points with grid spacing dh. For an elastic 2D grid this means mathematically:
+\EQ{courant:1}{\rm{dt \le \frac{dh}{h \sqrt{2} V_{max}},}}
+where $\rm{V_{max}}$ is the maximum velocity in the model. The factor h depends on the order of the FD operator and can easily calculated by summing over the weighting coefficients $\beta_i$
+\EQ{courant:1:1}{{\rm{h} = \sum_i \beta_i.}}
+In table \ref{courant.1} h is listed for different FD operator lengths and types (Taylor and Holberg operators). Criterion \ER{courant:1}
+is called {\bf{CourantFriedrichsLewy criterion}} (\cite{courant:28}, \cite{courant:67}). \FIG{courant_pics} shows the evolution of the pressure field when the Courant criterion is violated. After a few time steps the amplitudes are growing to infinity and the calculation becomes unstable.
\begin{table}[hbt]
\begin{center}
\begin{tabular}{ccccccc}\hline \hline
& \multicolumn{3}{c}{Taylor} & \multicolumn{3}{c}{Holberg}\\
FDORDER & M=2 & M=3 & M=4 & M=2 & M=3 & M=4\\ \hline
2nd & 0.577 & 0.494 & 0.384 &0.577&0.494&0.384 \\
4th & 0.494 & 0.424 & 0.329&0.487&0.418& 0.325 \\
6th & 0.464 & 0.398 & 0.309 &0.45&0.386&0.3
\\
8th & 0.448 & 0.384 & 0.299 &0.429&0.368&0.286\\
10th & 0.438 & 0.375 & 0.292 &0.416&0.357&0.277\\
12th & 0.431 & 0.369 & 0.287 &0.407&0.349&0.272\\
+\begin{tabular}{ccc}\hline \hline
+FDORDER & h (Taylor) & h (Holberg) \\ \hline
+2nd & 1.0 & 1.0 \\
+4th & 7/6 & 1.184614 \\
+6th & 149/120 & 1.283482 \\
+8th & 2161/1680 & 1.345927 \\
+10th & 53089/40320 & 1.387660 \\
+12th & 1187803/887040 & 1.417065 \\
\hline \hline
\end{tabular}
\caption{\label{courant.1} The factor r is the CourantFriedrichsLewy number for different orders FDORDER (2nd12th) and types (Taylor and Holberg) of FD operators. M represents the temporal order (FDORDER\_TIME).}
+\caption{\label{courant.1} The factor h in the Courant criterion for different orders (2nd12th) and types (Taylor and Holberg) of FD operators.}
\end{center}
\end{table}

\clearpage

\begin{figure}[ht]
\begin{center}
\epsfig{file=eps/courandt_1.pdf, width=7 cm}\epsfig{file=eps/courandt_2.pdf, width=7 cm}
\epsfig{file=eps/courandt_3.pdf, width=7 cm}\epsfig{file=eps/courandt_4.pdf, width=7 cm}
\caption{\label{courandt_pics} Temporal evolution of the Courant instability.}
+\caption{\label{courant_pics} Temporal evolution of the Courant instability. In the colored areas the wave amplitudes are extremly large.}
\end{center}
\end{figure}
\clearpage
\section{Seismic Modeling with SOFI3D}
The viscoelastic FD algorithm discussed in the previous section is implemented in the C program SOFI3D \cite{bohlen:02}. In the following sections, we give a short description of the different modeling parameters, options and how the program is used in a parallel MPI environment.

\subsection{Parallel Environment}
The parallelization employs functions of the Message Passing Interface (MPI). MPI has to be installed when compiling and running the SOFI3D software. At least two implementations exist for Unixbased networks: OpenMPI and MPICH2. The LAMMPI implementation is no longer supported by the developers. Currently all three implementation work with SOFI3D. OpenMPI and MPICH2 are MPI programming environments and development systems for heterogeneous computers on a network. With OpenMPI or MPICH2, a dedicated cluster or an existing network computing infrastructure can act as one parallel computer solving one problem. The latest version of OpenMPI can be obtained from \url{http://www.openmpi.org}. MPICH2 is available at \url{http://wwwunix.mcs.anl.gov/mpi/mpich}. LAMMPI can be still downloaded here: \url{http://www.lammpi.org}.

\subsection{LAM}
Even though outdated, LAMMPI will be used to illustrate the setting up of the MPI implementation. In order to reproduce the following instructions, you must first install the LAMMPI software.
On SUSE LINUX systems the package can be installed with yast. The software can also be downloaded from \url{http://www.lammpi.org}. A good documentation of LAM/MPI is available at this site. Before you can run the SOFI3D software on more than one node you must boot LAM. This requires that you can connect to all of your nodes with ssh without a password. To enable ssh connection without a password you must generate authentication keys on each node using the command sshkeygen t rsa. These are saved in the file \lstinline{$\sim$/.ssh/id_rsa.pub}. Copy all authentication keys into one file named \lstinline{authorized_keys} and copy this file on all nodes into the directory \lstinline{$\sim$/.ssh}.

Before you can boot LAM you must specify the IP addresses of the different processing elements in an ASCII file. An example file is par/lamhosts. Each line must contain one IP address. The first IP number corresponds to node 0, the second line to node 1 and so forth. Note that in older LAMMPI implementations the mpirun command to run the FD programs must always be executed on node 0 !,
i.e. you must log on node 0 (first line in the file par/lamhosts) and run the software on this node. In the last stable release of LAMMPI, the node 0 just has to be listed in the lamhosts file (the order does not matter). To boot lam just do \lstinline{lamboot v par/lamhosts}. To run the software in serial on a single PC just type \lstinline{lamboot} without specifying a lamhosts file. You can still specify more PEs in the FD software but all processes will be executed on the same CPU. Thus this only makes sense if you run the software on a multicore system. In this case, you should boot it without
a lamhosts file and specify a total number of 2 processing elements (PEs). To shut down LAM again (before logout) use the command \lstinline{lamclean v}. However, it is not necessary to shut down/restart LAM after each logout/login.

\section{Installation}
\label{installation}
After unpacking the software package (e.g. by \lstinline{tar zxvf sofi3D.tgz}) and changing to the directory sofi3D ( \lstinline{cd sofi3D}) you will find different subdirectories:

\textbf{.svn}\\
The software is updated using the version control system subversion (SVN). This directory contains internal information on recent software updates etc. You can ignore this directory, a clean release should not have any .svn directories at all. Each subdirectory described below also contains a subdirectory SVN which can be ignored as well.
+In the SOFI3D directory you will find different subdirectories:
\textbf{bin}\\
This directory contains all executable programs, generally sofi3D and snapmerge. These executables are generated using the command \lstinline{make $<$program$>$} (see below).
+This directory contains all executable programs, generally sofi3D and snapmerge. These executables are generated using the command \lstinline{make } (see below).
\textbf{doc}\\
This directory contains documentation on the software (this users guide) as well as some important papers in PDF format on which the software is based on. It also contains the full documentation of the SOFI3D overnight built testing suite (see \textbf{overnightbuilt}).
+This directory contains documentation on the software (this users guide).
\textbf{genmod}\\
Contains the model and benchmark files for SOFI3D.
\textbf{mfiles}\\
Here some Matlab routines (mfiles) are stored. These Matlab programs can be used to find optimal relaxation frequencies to approximate a constant Q (qapprox.m) or to plot Q as a function of frequency for
certain relaxation frequencies and value of tau (qplot.m). For further details on the theory behind these algorithms we refer to the Phd thesis of T. Bohlen \cite{bohlen:98} and to the paper in which the socalled taumethod is described \cite{blanch:95}. The parameter tau is used in this software to define the level of attenuation.
+Here some Matlab routines (mfiles) are stored.
\textbf{overnightbuilt}\\
This directory contains a complete bechmarking suite designed to test whether your installed release of SOFI3D works as intended. During the test run, several model scenarios will be simulated and ,afterwards, the created seismogram and snapshot files will be compared with reference values created by the SOFI3D developer team. This benchmarking test can be used as an overnight built check while developing the code, i.e. after modifications to the code, the test evaluates if standard modeling scenarios still result in the same simulation output. For further information we refer to the separate documentation \lstinline{sofi3D_overnight_documentation} located in the folder \textbf{doc}.
+This directory is part of the overnightbuilt version of SOFI2D.
+It contains a complete bechmarking suite designed to test whether your installed release of SOFI3D works as intended. During the test run, several model scenarios will be simulated and ,afterwards, the created seismogram and snapshot files will be compared with reference values created by the SOFI3D developer team. This benchmarking test can be used as an overnight built check while developing the code, i.e. after modifications to the code, the test evaluates if standard modeling scenarios still result in the same simulation output. For further information we refer to the separate documentation \lstinline{sofi3D_overnight_documentation} located in the folder \textbf{doc}.
\textbf{par}\\
Parameter files for SOFI3D modeling.
@@ 633,7 +611,7 @@ Parameter files for SOFI3D modeling.
Here, you will find examples of scriptfiles used to submit modeling jobs on clustercomputers. This directory also contains the shell script \lstinline{convert_silo_new.sh} to convert 3D snapshot data into the \textit{silo data format} for further visualization using VisIT (see section \ref{visual}). With the c++ code \lstinline{cut_plane.cxx} a xyplane in zdirection can be extrated and written into a separate file for further visualization.
\textbf{src}\\
This directory contains the complete source codes. The following programs are available and my be compiled using make $<$program$>$.
+This directory contains the complete source codes. The following programs are available and my be compiled using \lstinline{make }.
\begin{table}[hbt]
\begin{tabular}{ll}
@@ 653,7 +631,7 @@ In order to illustrate the usage of SOFI3D, we use a homogenous block model adap
\EQ{prob1:1}{dh \le \frac{V_{s,min}}{n\; f_{max}} = 54.2\;\mathrm{m}.}
The whole model grid has the dimensions 100 x 100 x 100 gridpoints. To avoid a violation of the Courant criterion the time step size DT is calculated according to \ER{courandt:1} using $v_{pmax}=3500\; m/s$, $h=1.345$ (see Table \ref{courant.1}) and $dh=54.0\; m$:
+The whole model grid has the dimensions 100 x 100 x 100 gridpoints. To avoid a violation of the Courant criterion the time step size DT is calculated according to \ER{courant:1} using $v_{pmax}=3500\; m/s$, $h=1.345$ (see Table \ref{courant.1}) and $dh=54.0\; m$:
\EQ{prob1:2}{DT \le \frac{dh}{h \sqrt{3} v_{pmax}} = 6.62\times 10^{3} s \approx 6.6\; \mathrm{ms}.}
@@ 703,7 +681,7 @@ NPROCZ : number of processors in zdirection\\
Parallelization is based on domain decomposition (see Figure \ref{fig_grid}), i.e each processing element (PE) updates the wavefield within his portion of the grid. The model is decomposed
by the program into sub grids. After decomposition each processing elements (PE) saves only his subvolume of the grid. NPROCX, NPROCY and NPROCZ specify the number of
processors in x, y and zdirection, respectively (Figure \ref{fig_grid}). The total number of processors thus is NP=NPROCX*NPROCY*NPROCZ. This value must be specified when starting the program with the mpirun command: \lstinline{mpirun np $<$NP$>$ ../bin/sofi3D ./in_and_out/sofi3D.json} (see section \ref{compexec1}). If the total number of processors in sofi3D.json
+processors in x, y and zdirection, respectively (Figure \ref{fig_grid}). The total number of processors thus is NP=NPROCX*NPROCY*NPROCZ. This value must be specified when starting the program with the mpirun command: \lstinline{mpirun np ../bin/sofi3D ./in_and_out/sofi3D.json} (see section \ref{compexec1}). If the total number of processors in sofi3D.json
and the command line differ, the program will terminate immediately with a corresponding error message (Try it !). Obviously, the total number of PEs (NPROCX*NPROCY*NPROCZ) used to decompose the model should be less equal than the total number of CPUs which are available on your parallel machine. If you use LAM and decompose your model in more domains than CPUs are available two or more domains will be updated on the same CPU (the program will not terminate and will produce the correct results). However, this is only efficient if more than one processor is available on each node. In order to reduce the amount of data that needs to be exchanged between PEs, you should decompose the model into more or less cubic sub grids. In our example, we use 2 PEs in each direction: NPROCX=NPROCY=NPROCZ=2. The total number of PEs used by the program is NPROC=NPROCX*NPROCY*NPROCZ=8.
\begin{figure}
\begin{center}
@@ 773,7 +751,7 @@ TIME : time of wave propagation\\
DT : timestep (in seconds)\\
The propagation time of seismic waves in the entire model is TIME. The time stepping interval (DT) has to fulfill the stability criterion \ER{courandt:1} in section \ref{courandt}.
+The propagation time of seismic waves in the entire model is TIME. The time stepping interval (DT) has to fulfill the stability criterion \ER{courant:1} in section \ref{courant}.
The program checks these criteria for the entire model, outputs a warning message if these are violated , stops the program and will output the time step interval for a stable model run.
@@ 958,7 +936,7 @@ If READMOD=1, the Pwave, Swave, and density model grids are read from external
In these files, each material parameter value must be saved as 32 bit (4 byte) native float. Velocities must be in meter/second, density values in $kg/m^3$. The fast dimension is the y direction. See src/readmod.c. The number of samples for the entire model in the xdirection is NX, the number of values in the ydirection is always NY and the number of values in the zdirection is always NZ. The file size of each model file thus must be NX*NY*NZ*4 bytes. You may check the model structure using the SU command ximage:
ximage n1=$<$NY$>$ n2=$<$NX$>$ $<$ model/test.vp .
+\lstinline {ximage n1= n2= < model/test.vp}.
If the variable TAU = 0.0 (see the next section \textit{Qapproximation}), it is also possible to read Qp, and Qs grid files to allow for spatial variable attenuation. In case of TAU > 0, constant damping is assumed with $Q_p=Q_s = \frac{2}{\mbox{TAU}}$. Please not that in order to ensure proper visualization using xmovie or ximage, the binary models are structured in terms of coordinate priority as follows : [Y,X,Z]. That means a 3D model consits of NZ planes of the size NY $\times$ NX. The fast dimension of each NYNX plane is the vertical Ydirection, i.e., first, columns of NY material parameters are written to file, in total NX columns of the length NY are written. To give an example of how to create a 3D model using Matlab, the MFile \lstinline{create_simple_tunnel_mod3D.m} is located in the folder \lstinline{mfiles} (see Chapter \ref{installation}).
@@ 1190,7 +1168,7 @@ OUT\_TIMESTEP\_INFO : every OUTNTIMESTEPINFOth time step information on update a
SOFI3D can output a lot of useful and possibly not useful information about the modeling parameters and the status of the modeling process etc. The major part of this information is output by PE 0.
If LOG=1, PE 0 writes this info to stdout, i.e. on the screen of your shell. This is generally recommended to monitor the modeling process. You may want to save this screen info to a output file by adding \lstinline{$>$ sofi3D.jout} or \lstinline{tee sofi3D.jout}. to your starting command. If LOG=1 all other processes with PE number (PEno) greater than zero will write their information to LOG\_FILE.PEno. If you specify LOG=2 also PE 0 will output his information to LOG\_FILE.0. As a consequence only little information is written directly to the screen of your shell. On supercomputers where you submit modeling jobs to a queuing system as batch jobs LOG=2 may be advantageous. In case of LOG=2, you may still watch the simulation by checking the content of LOG\_FILE.0 for example by using the Unix commands more or tail. After finishing the program the timing information is written to the ASCII file log/test.log.0.timings. This feature is useful to benchmark your local PC cluster or supercomputer (see section \ref{benchmark_sofi3D}). If LOG=0 no output from node 0 will be written, neither to stdout nor to an LOG file. There will be also no output of timing information to the ASCII file log/test.log.0.timings.
+If LOG=1, PE 0 writes this info to stdout, i.e. on the screen of your shell. This is generally recommended to monitor the modeling process. You may want to save this screen info to a output file by adding \lstinline{> sofi3D.jout} or \lstinline{tee sofi3D.jout}. to your starting command. If LOG=1 all other processes with PE number (PEno) greater than zero will write their information to LOG\_FILE.PEno. If you specify LOG=2 also PE 0 will output his information to LOG\_FILE.0. As a consequence only little information is written directly to the screen of your shell. On supercomputers where you submit modeling jobs to a queuing system as batch jobs LOG=2 may be advantageous. In case of LOG=2, you may still watch the simulation by checking the content of LOG\_FILE.0 for example by using the Unix commands more or tail. After finishing the program the timing information is written to the ASCII file log/test.log.0.timings. This feature is useful to benchmark your local PC cluster or supercomputer (see section \ref{benchmark_sofi3D}). If LOG=0 no output from node 0 will be written, neither to stdout nor to an LOG file. There will be also no output of timing information to the ASCII file log/test.log.0.timings.
By using the switch OUT\_SOURCE\_WAVELET you can decide to output the used source wavelet (yes=1/no=0). Every PE that includes a source locatation will than outputs the source signal time series as SU output. Due to the fact that the output of information on the update and exchange via fprint can be both slowing down the computation for very small models and producing large output files, you can choose by OUT\_TIMESTEP\_INFO after how many time steps such intermediate information are given.
@@ 1225,7 +1203,7 @@ As you can see from lines
the density model is written to file to be displayed as described in Chapter \ref{gen_of_mod}. As a base name for the model file, the variable MFILE from the input file is used that actually specifies the external velocity and density files. Note, that the thickness of the first layer \textit{h} is set to 10000 m (10km) which is larger than the vertical extension (NY*DH=5400 m). This way, we effectively created a homogeneous model while demonstrating a two layer case. Simple decrease the variable \textit{h} to 2700 and a real two layer model is created. Generally, you only have to modify the material property definitions in the upper part of the model file and code some if statements to construct simple structures such as planes, cubes, tubes, etc.
\subsection{Compilation of SOFI3D}\label{compexec}
The source codes are located in the directory src. To compile SOFI3D the name of the model function has to be entered in the MAKEFILE. Depending on your MPI environment (MPI distribution) you may need to modify the compiler options in src/Makefile. For a few typical platforms the compiler options are available in src/Makefile. It is often usefull to enable the highest level of optimization (typically 03 or 04). For example the optimization option 04 of the hcc LAM compiler leads to a speedup of SOFI3D of approximately 30 per cent. No other changes in the Makefile should be necessary.
+The source codes are located in the directory src. To compile SOFI3D the name of the model function has to be entered in the MAKEFILE. Depending on your MPI environment (MPI distribution) you may need to modify the compiler options in src/Makefile. For a few typical platforms the compiler options are available in src/Makefile. It is often usefull to enable the highest level of optimization (typically 03 or 04). No other changes in the Makefile should be necessary.
\begin{verbatim}
# Makefile for SOFI3D
@@ 1246,18 +1224,34 @@ MODEL_SRC_BENCH = benchmod.c
# Compiler (LAM: CC=hcc, CRAY T3E: CC=cc, SHARCNet: mpicc)
# ON Linux cluster running LAM
CC=hcc
LFLAGS=lm lmpi
CFLAGS=g Wall O4
+# On Linux cluster running LAM
+# CC=hcc
+# LFLAGS=lm lmpi
+# CFLAGS=g Wall O4
# On CRAY T3E
#CC=cc
+# CC=cc
+
+# On Linux Cluster using OpenMPI
+ CC=mpicc
+ LFLAGS=lm
+ CFLAGS=Wall O3
+
+# On InstitutsCluster2 using intelMPI AND ITAC (no optimization)
+# CC = mpicc
+# LFLAGS = tcollect
+# CFLAGS = tcollect trace
+
+# On InstitutsCluster2 using Open MPI and scalasca (no optimization)
+#CC = skin mpicc
+#LFLAGS =
+#CFLAGS =O3
+
+#On InstitutsCluster2 using Open MPI and DDT (no optimization)
+#CC = mpicc
+#LFLAGS = g
+#CFLAGS = g O0 i_dynamic
# On Linux Cluster running OpenMPI
#CC=mpicc
#LFLAGS=lm
#CFLAGS=Wall O4
# On JUROPA cluster
#CC=mpicc
@@ 1282,7 +1276,6 @@ CFLAGS=g Wall O4
# after this line, no further editing should be necessary
# 

\end{verbatim}
To compile the program SOFI3D you must change to the \lstinline{src} directory and execute:
@@ 1292,41 +1285,6 @@ or
\lstinline{make sofi3D}
The following (or a similar) output should occur:
\begin{verbatim}
icc c mp O3 ip0 sofi3D.c
icc c mp O3 ip0 comm_ini_s.c
icc c mp O3 ip0 checkfd_ssg.c
icc c mp O3 ip0 seismo_ssg.c
icc c mp O3 ip0 matcopy.c
icc c mp O3 ip0 surface_ssg.c
icc c mp O3 ip0 update_s_ssg.c
icc c mp O3 ip0 update_s_ssg_PML.c
icc c mp O3 ip0 update_v_ssg.c
icc c mp O3 ip0 update_v_ssg_PML.c
icc c mp O3 ip0 snap_ssg.c
[...]
icc c mp O3 ip0 sources.c
icc c mp O3 ip0 splitrec.c
icc c mp O3 ip0 splitsrc.c
icc c mp O3 ip0 timing.c
icc c mp O3 ip0 util.c
icc c mp O3 ip0 wavelet.c
icc c mp O3 ip0 writedsk.c
icc c mp O3 ip0 writemod.c
icc c mp O3 ip0 writepar.c
icc c mp O3 ip0 zero.c
icc c mp O3 ip0 zero_sofi3D.c
icc lmpi lm istatic sofi3D.o comm_ini_s.o checkfd_ssg.o seismo_ssg.o
matcopy.o surface_ssg.o update_s_ssg.o update_s_ssg_PML.o update_v_ssg.o
update_v_ssg_PML.o snap_ssg.o exchange_v.o exchange_s.o psource.o
hh.o absorb.o absorb_PML.o av_mat.o comm_ini.o info.o
initproc.o initsour.o merge.o mergemod.o note.o outseis.o PML_ini.o
rd_sour.o read_checkpoint.o readdsk.o read_par.o exchange_par.o receiver.o
save_checkpoint.o saveseis.o sources.o splitrec.o splitsrc.o timing.o
util.o wavelet.o writedsk.o writemod.o writepar.o zero.o zero_sofi3D.o
o ../bin/sofi3D
\end{verbatim}
The program snapmerge that is used to merge the snapshots (see below) can be compiled with ''make snapmerge''. Since this is not a MPI program (no MPI functions are called) the MPI libraries are not required and any standard compiler (like gcc and cc) can be used to compile this program. The executables sofi3D and snapmerge are located the directory bin.
\subsection{Running the program}\label{compexec1}
@@ 1335,7 +1293,7 @@ The command to start a simulation on 8 processor with the lowest priority of 19
\lstinline{mpirun np 8 nice 19 ../bin/sofi3D ./in_and_out/sofi3D.json }
If you use LAMMPI the command \lstinline{lamboot v lamhost} must be executed on node 0 which is the PE where ./par/lamhosts is the file containing IP addresses of all computing nodes. It if often useful to save the standard output of the program for later reference. The screen output may be saved to sofi3D.jout using
+It if often useful to save the standard output of the program for later reference. The screen output may be saved to sofi3D.jout using
\lstinline{mpirun np 8 nice 19 ../bin/sofi3D ./in_and_out/sofi3D.json > ./in_and_out/sofi3D.jout}
diff git a/doc/guide_sofi3D/tex/thesis.bib b/doc/guide_sofi3D/tex/thesis.bib
index bcdbdff..79badf4 100644
 a/doc/guide_sofi3D/tex/thesis.bib
+++ b/doc/guide_sofi3D/tex/thesis.bib
@@ 588,6 +588,24 @@ http://www.fzjuelich.de/ias/jsc/EN/Expertise/Supercomputers/JUROPA/JUROPA_node.
NUMBER = 1,
PAGES = {58} }
+@ARTICLE{courant:28,
+ author = {Courant, {R.} and Friedrichs, {K.} and Lewy, {H.}},
+ journal = {Mathematische Annalen},
+ title = {{\"U}ber die partiellen {D}ifferenzen\gleichungen der mathematischen {P}hysik},
+ volume = {100},
+ pages = {3274},
+ year ={1928}
+}
+
+@ARTICLE{courant:67,
+ author = {Courant, {R.} and Friedrichs, {K.} and Lewy, {H.}},
+ journal = {IBM Journal},
+ title = {On the partial difference equations of mathematical physics},
+ pages = {215234},
+ year ={March 1967}
+}
+
+
@article{coutant:95,
AUTHOR = {Coutant, {O.} and Virieux, {J.} and Zollo, {A.}},
TITLE = {Numerical Source Implementation in a 2D Finite Difference Scheme for

2.24.1