# 5 Numerical Approximation of DAEs

## 5.1 Notions and Notations

We will consider an equidistant time grid of a fixed time interval \([t_0,t_e]\): \[ t_0 < t_1 < t_2 < \dotsm < t_N=t_e \] with time step size \(h\), i.e. \(t_i = t_0 + ih\), for \(i=1,2,\dotsc,N\), or \(h=\frac{t_e - t_0}{N}\).

The restriction of equidistant grids is convenient for the analysis and does not mean a great loss of generality. Typically, in all the estimates that follow and in which the *constants* remain unspecified, a non equidistant grid can be accommodated by setting \(h\) to be the largest time step under consideration.

Throughout this chapter, we will assume that system under consideration has a unique solution \(x\) and we will use the notation \[ x_i \approx x(t_i) \] to express that \(x_i\) is defined as the numerically computed approximation to the solution \(x\) at time \(t_i\).

Generally, the approximants \(x_i\) are computed iteratively by a numerical scheme \(\phi\) like
\[
x_{i+1} = \phi(t_i,h,x_i,x_{i+1}). \tag{5.1}
\]
If \(x_{i+1}\) appears in the definition of the function \(\phi\), then the scheme is called *implicit*.
Otherwise, it is called an explicit scheme.
If the scheme bases on previous iterates like \(x_{i-1}\), \(x_{i-2}\), …, \(x_{i-k}\) with \(k\geq 1\), then the scheme is called a *multi-step scheme*.
Otherwise, it is called a *single-step scheme*.

Generally, the analysis of the schemes and their application to problem classes tries to establish convergence, e.g.,
\[
\|x_N - x(t_e)\| \to 0
\]
as \(h \to 0\). More precisely, tries to establish estimates like
\[
x_N - x(t_e) = \mathcal O(h^p)
\]
meaning that the error approaches \(0\) at least as fast as the convergence order \(p\). This \(p\) is then called the *order of convergence* (of the method \(\phi\)).

If the method \(\phi\) is stable^{6}, then an estimate on the (local) consistency error (cp. the definition of the iteration (5.1)) like
\[
x(t_{i+1}) - \phi(t_i, h, x(t_i), x(t_{i+1})) = \mathcal O(h^q),
\]
i.e. the *order of consistency* of \(\phi\), transfers to the global convergence order as \(p=q-1\).

We will start with one-step methods and in particular with Runge-Kutta methods (RKM) that represent the most commonly applied time discretization schemes. A Runge-Kutta method is defined by its number of stages \(s\), and by parameter vectors
\[
\beta = \{\beta_j\}_{j=1, \dotsc, s}, \quad \gamma=\{\gamma_j\}_{j=1,\dotsc,s}, \quad
\mathcal A = \{\alpha_{j\ell}\}_{j,\ell=1, \dotsc, s}
\]
that, in turn, define the increment \(x_{i+1}=\phi(t_i,h,x_i,x_{i+1})\) via
\[
x_{i+1}=x_i+h\sum_{j=1}^s\beta_j \dot X_{ij},
\]
where the *stage derivatives* \(\dot X_{ij}\) are connected with the *stage values* \(X_{ij}\) via the following possibly nonlinear (depending on the problem: \(\dot x(t) = f(t,x(t))\)) and possibly implicit (depending on the method) system of equations
\[\begin{align}
\dot X_{ij} &= f(t_i+\gamma_jh, X_{ij}), \\
X_{ij} &= x_i + h \sum_{\ell=1}^s \alpha_{j\ell} \dot X_{i\ell}.
\end{align}\]

The matrix \(\mathcal A\) and the vectors \(\beta\) and \(\gamma\) of parameters are conveniently written into the so-called *Butcher-tableau*
\[
\begin{array}{c|c}
\gamma & \mathcal A \\
\hline
& \beta^T \\
\end{array}
\]

It is well-known that one-step methods are stable (see also the first paragraph of Kunkel/Mehrmann Ch. 5.2). Accordingly, convergence can be derived directly from the consistency error. For a general Runge-Kutta method, the convergence conditions – if applied to an ODE \(\dot x = f(t,x)\) – can be expressed in terms of the coefficients:

**Theorem 5.1 (Butcher’s Theorem) **If the coefficients \(\beta_j\), \(\gamma_j\), and \(\alpha_{j\ell}\) fulfill the conditions

Condition | range of \(k\) | |
---|---|---|

B(p): | \(\sum_{j=1}^s\beta_j\gamma_j^{k-1} = \frac 1k\) | \(k=1,2,\cdots,p\) |

C(q): | \(\sum_{\ell=1}^s\alpha_{j\ell}\gamma_\ell^{k-1} = \frac 1k\gamma_j^k,\quad\) for \(j=1,\dotsc s\) | \(k=1,2,\cdots,q\) |

D(r): | \(\sum_{j=1}^s\beta_j\gamma_j^{k-1} \alpha_{j\ell}= \frac 1k\beta_\ell(1-\gamma_\ell^k),\quad\) for \(\ell=1,\dotsc s\) | \(k=1,2,\cdots,r\) |

with \[ p\leq q+r+1 \] and \[ p \leq 2q+2,\] then the Runge-Kutta method is convergent of order \(p\).

In particular for the analysis of the approximation of linear DAEs, the Kronecker product \(\otimes\) and two of its properties will be helpful:

**Definition 5.1**For two matrices \(U \in \mathbb C^{m,n}\) and \(V\in \mathbb C^{k,l}\), with \[ U = \begin{bmatrix} u_{11} & \dots & u_{1n} \\ \vdots & \ddots & \vdots \\ u_{m1} & \dots & u_{mn} \end{bmatrix} \] their

*Kronecker product*\(U \otimes V\) is defined as \[ U\otimes V = \begin{bmatrix} u_{11}V & \dots & u_{1n}V \\ \vdots & \ddots & \vdots \\ u_{m1}V & \dots & u_{mn}V \end{bmatrix} \in \mathbb C^{mk, nl}. \]

**Lemma 5.1**For given dimensions \(m\), \(n\), \(k\), \(l\), there exist permutations matrices \[ \Pi_1 \in \mathbb R^{mk, mk}, \quad \Pi_2 \in \mathbb C^{kl, kl}, \] so that for any matrices \(U \in \mathbb C^{m,n}\) and \(V\in \mathbb C^{k,l}\) it holds that \[ U\otimes V = \Pi_1^T (V\otimes U) \Pi_2. \] If \(n=m\) and \(k=l\), then \(\Pi_1=\Pi_2\).

Note that the inverse of a permutation matrix \(\Pi\) is its transpose \(\Pi^T\), so that for square matrices \(U\) and \(V\) with \(\Pi_1=\Pi_2=:\Pi\), it holds that \[ U\otimes V = \Pi^T(V\otimes U)\Pi \sim V\otimes U. \]

**Lemma 5.2**If the given matrices \(R\), \(S\), \(U\), \(V\) have compatible dimensions so that the products \(UR\) and \(VS\) exist, then \[ (U\otimes V)(R\otimes S) = UR \otimes VS. \]

## 5.2 Runge-Kutta Methods for Linear DAEs with Constant Coefficients

In this section, we will analyse the approximation error of the RKM \((\mathcal A, \beta, \gamma)\) applied to a regular linear DAE with constant coefficients

\[ E\dot x = Ax+f(t). \tag{5.2} \]

To motivate the following arguments and assumptions, we just try to apply the *explicit Euler* to (5.2) so that a single iteration step reads
\[
Ex_{i+1} = Ex_i + h(Ax_i + f(t_i)),
\]
which can not fully define \(x_{i+1}\) if \(E\) is not invertible.

On the other hand, if \((E,A)\) is regular, then one step of the *implicit Euler* scheme,
\[
(E-hA)x_{i+1} = Ex_i + hf(t_i),
\]
well defines the next iterate for any choice of \(h\) except a few.

In more generality, a complete iteration step of a general RKM applied to (5.2), can be expressed via the solution^{7} of the linear system

\[ (I_s \otimes E - h\mathcal A \otimes A) \dot X_i = Z_i \] with \[ \dot X_i = \begin{bmatrix} \dot X_{i1} \\ \vdots \\ \dot X_{is} \end{bmatrix}, \quad Z_i = \begin{bmatrix} Ax_{i}+f(t_i+\gamma_1h) \\ \vdots \\ Ax_{i}+f(t_i+\gamma_sh) \end{bmatrix} \] If \((E, A)\) is regular, and \(P\) and \(Q\) are those matrices that bring \((E,A)\) into Kronecker normal form, then, with Lemma 5.2, we can find that \[ (I_s\otimes P)(I_s \otimes E - h\mathcal A \otimes A)(I_s \otimes Q) = (I_s \otimes PEQ - h(\mathcal A \otimes PAQ)) \] which by Lemma 5.1 is similar to \[ PEQ\otimes I_s - PAQ \otimes h\mathcal A = \begin{bmatrix} I \otimes I_s & \\ & N\otimes I_s \end{bmatrix} - h \begin{bmatrix} J \otimes \mathcal A & \\ & I\otimes \mathcal A \end{bmatrix}, \] which means that for a regular DAE, one step of an RKM can be interpreted as one step of an RKM for the ODE and one step for the DAE part in the canonical form.

As for the invertibility of the coefficient matrix, we first observe that for \(h\) sufficiently small, the part \(I\otimes I_s - hJ\otimes \mathcal A\) is invertible. For the second part, we assume that \(N\) is in Jordan form and consists of only one block (otherwise the arguments can be formulated for each block separately) to find that \[ N\otimes I_s - I \otimes h \mathcal A = \begin{bmatrix} -h\mathcal A & I_s \\ & \ddots & \ddots \\ && -h\mathcal A & I_s \\ &&& -h\mathcal A \end{bmatrix} \] is invertible if, and only if, \(\mathcal A\) is invertible.

By the previous considerations, the following assumptions are well justified – at least for the theoretical analysis of the schemes.

- \((E,A)\) is regular \(\leftarrow\) the theory needs a unique solution
- \((E,A)\) is in Kronecker Canonical Form \(\leftarrow\) RKM are invariant under equivalence transformation
- \((E,A)=(N,I)\) \(\leftarrow\) as the other (the
*regular part*) can be treated by ODE theory - \(E=N=N_\nu\) consists of a single Jordan block \(\leftarrow\) otherwise consider each Jordan block separately.

Thus, we will consider the special DAE

\[\begin{equation} \begin{bmatrix} 0 & 1 & & & \\ & 0 & 1 & & \\ & & \ddots & \ddots & \\ & & & 0 & 1 \\ & & & & 0 \end{bmatrix} \dot x = x + f(t), \tag{5.3} \end{equation}\]

where

\[ x(t) = \begin{bmatrix} x_1(t) \\ x_2(t) \\ \vdots \\ x_\nu(t) \end{bmatrix} \quad\text{and}\quad f(t) = \begin{bmatrix} f_1(t) \\ f_2(t) \\ \vdots \\ f_\nu(t) \end{bmatrix} \]

In the following theorem, the consistency error of a general implicit RKM applied to the special nilpotent DAE is analysed.

**Theorem 5.2 **The local error of an RKM with \(\mathcal A\) invertible applied to (5.3) behaves like

\[ x(t_{i+1}) - x_{i+1} = \mathcal O(h^{\kappa_\nu - \nu + 2} + h^{\kappa_{\nu-1} - \nu + 3} + \cdots + h^{\kappa_1 +1}) \]

where \(\kappa_j\) is the maximum number such that

Condition | range of \(k\) | |
---|---|---|

a.) | \(\beta^T\mathcal A^{-k}e = \beta^T\mathcal A^{-j}\gamma^{j-k} / (j-k)!\) | \(k=1,2,\cdots,j-1\) |

b.) | \(\beta^T\mathcal A^{-j}\gamma^k = k! / (k-j+1)!\) | \(k=j,j+1,\cdots\) |

for all \(k\leq \kappa_j\) and for \(j=1, \cdots, \nu\) and where \(e\in \mathbb R^{\nu}\) is the vector of ones.

*Proof. * Since we consider the pure consistency error, we can assume that \(x_i=x(t_i)\). With that, with the Taylor expansion of the solution
\[
x(t_{i+1}) = x(t_i+h) = x(t_i) + \sum_{k\geq 1} \frac{h^k}{k!}x^{(k)}(t_i),
\]
and with the definition of the RKM, the error is given as
\[
\tau = x(t_{i+1}) - x_{i+1} = -h\sum_{j=1}^s \beta_j \dot X_{ij} + \sum_{k\geq 1} \frac{h^k}{k!}x^{(k)}(t_i).
\]

Because of the special structure of the DAE, we can concentrate on the first error component \(\tau_1\) \(\leftarrow\) the error component \(\tau_2\) is the *first* component of the problem of index \(\nu-1\). For \(\tau_1\) we have the formula

\[ \tau_1 = x_1(t_{i+1}) - x_{i+1,1} = h\beta^T\sum_{j=1}^\nu (h\mathcal A)^{-j}Z_{ij} + \sum_{k\geq 1} \frac{h^k}{k!}x_1^{(k)}(t_i). \]

One may confirm directly, or by means of the solution formula for \(N\dot x = x + f\), that the \(\ell\)-th component of \(x\) is defined as \[ x_\ell(t) = - \sum_{j=\ell}^{\nu}f_j^{(j-\ell)}(t). \]

The componentwise Taylor expansion of \(Z_{i,\ell}\) reads \[\begin{align*} Z_{i\ell} &= \begin{bmatrix} x_{i,\ell} + f_\ell(t_i+\gamma_1 h) \\ x_{i,\ell} + f_\ell(t_i+\gamma_2 h) \\ \vdots \\ x_{i,\ell} + f_\ell(t_i+\gamma_s h) \end{bmatrix} = \begin{bmatrix} x_{i,\ell} + f_\ell(t_i) + \sum_{k\geq 1}\frac{h^k}{k!}f_\ell^{(k)}(t_i)\gamma_1^k \\ x_{i,\ell} + f_\ell(t_i) + \sum_{k\geq 1}\frac{h^k}{k!}f_\ell^{(k)}(t_i)\gamma_2 ^k \\ \vdots \\ x_{i,\ell} + f_\ell(t_i) + \sum_{k\geq 1}\frac{h^k}{k!}f_\ell^{(k)}(t_i)\gamma_s ^k \end{bmatrix} \\ &= x_{i,\ell}e+\sum_{k\geq 0} \frac{h^k}{k!}f_\ell^{(k)}(t_i)\gamma^k \end{align*}\]

With that and with \(x_i=x(t_i)\), we expand the error \(\tau_1\) as follows: \[\begin{align*} \tau_1 &= h\beta^T\sum_{j=1}^\nu (h\mathcal A)^{-j}Z_{ij} + \sum_{k\geq 1} \frac{h^k}{k!}x_1^{(k)}(t_i)\\ &= \beta^T\sum_{j=1}^\nu h^{-j+1}\mathcal A^{-j}\Bigg[ x_{j}(t_i)e+\sum_{k\geq 0} \frac{h^k}{k!}f_j^{(k)}(t_i)\gamma^k\Bigg] \\&\quad\quad\quad\quad+ \sum_{k\geq 1} \frac{h^k}{k!}x_1^{(k)}(t_i)\\ &= \beta^T\sum_{j=1}^\nu h^{-j+1}\mathcal A^{-j}\Bigg[ -\sum_{k=j}^{\nu}f_k^{(k-j)}(t_i)e+\sum_{k\geq 0} \frac{h^k}{k!}f_j^{(k)}(t_i)\gamma^k\Bigg] \\&\quad\quad\quad\quad- \sum_{k\geq 1} \frac{h^k}{k!} \sum_{j=1}^{\nu}f_j^{(j-1+k)}(t_i)\\ &= -\sum_{j=1}^\nu \sum_{k=j}^{\nu} h^{-j+1}\beta^T\mathcal A^{-j} ef_k^{(k-j)}(t_i)+\sum_{j=1}^\nu \sum_{k\geq 0}\frac{h^{k-j+1}}{k!}\beta^T\mathcal A^{-j} \gamma^k f_j^{(k)}(t_i) \\&\quad\quad\quad\quad- \sum_{k\geq 1}\sum_{j=1}^{\nu} \frac{h^k}{k!} f_j^{(j-1+k)}(t_i), \end{align*}\]

which, with \(\sum_{j=1}^\nu \sum_{k=j}^\nu g(j,k) = \sum_{k=1}^\nu \sum_{j=1}^k g(j,k)= \sum_{j=1}^\nu \sum_{k=1}^j g(k,j)\), becomes

\[\begin{align*} \tau_1 &= \sum_{j=1}^{\nu} \Bigg[ -\sum_{k=1}^j h^{-k+1}\beta^T\mathcal A^{-k} ef_j^{(j-k)}(t_i)\\&\quad\quad\quad\quad+\sum_{k\geq 0}\frac{h^{k-j+1}}{k!}\beta^T\mathcal A^{-j} \gamma^k f_j^{(k)}(t_i) \\&\quad\quad\quad\quad- \sum_{k\geq 1}\frac{h^k}{k!} f_j^{(j-1+k)}(t_i) \Bigg] \\ &= \sum_{j=1}^{\nu} \Bigg[ -\sum_{k=1}^j h^{-k+1}\beta^T\mathcal A^{-k} ef_j^{(j-k)}(t_i)\\ &\quad\quad\quad\quad+\sum_{k=0}^{j-1}\frac{h^{k-j+1}}{k!}\beta^T\mathcal A^{-j} \gamma^k f_j^{(k)}(t_i)+\sum_{k\geq j}\frac{h^{k-j+1}}{k!}\beta^T\mathcal A^{-j} \gamma^k f_j^{(k)}(t_i) \\&\quad\quad\quad\quad- \sum_{k\geq 1}\frac{h^k}{k!} f_j^{(j-1+k)}(t_i) \Bigg]. \end{align*}\]

A shift of indices, \(\sum_{k=0}^{j-1}g(k)=\sum_{k=1}^j g(j-k)\) and \(\sum_{k\geq 1}g(k)=\sum_{k\geq j}g(k-j+1)\), then gives: \[\begin{align*} \tau_1 &= \sum_{j=1}^{\nu} \Bigg[ -\sum_{k=1}^j h^{-k+1}\beta^T\mathcal A^{-k} ef_j^{(j-k)}(t_i)+\sum_{k=1}^{j}\frac{h^{-k+1}}{(j-k)!}\beta^T\mathcal A^{-j} \gamma^{j-k} f_j^{(j-k)}(t_i)\\ &\quad\quad\quad\quad+\sum_{k\geq j}\frac{h^{k-j+1}}{k!}\beta^T\mathcal A^{-j} \gamma^k f_j^{(k)}(t_i) - \sum_{k\geq j}\frac{h^{k-j+1}}{(k-j+1)!} f_j^{(k)}(t_i) \Bigg]. \end{align*}\]

and a comparison of the coefficients for the same orders of \(h\) and derivatives of \(f\) leads to the conditions. Note the ranges of the sums that depend on \(j=1, \dots, \nu\).

*smaller*nilpotent blocks separately, the result can be formulated for the general case, as it is used in the theorem below.

To prove convergence of the approximations another stability condition is added.

**Theorem 5.3 (Kunkel/Mehrmann, Thm. 5.12)**Consider an implicit RKM with coefficients \(\mathcal A\), \(\beta\), and \(\gamma\) and a linear time invariant DAE \(E\dot x = Ax+f(t)\) with \((E,A)\) regular and of index \(\nu\). Let \(\kappa_j\), \(j=1,\dots,\nu\), be the quantities of \((\mathcal A,\beta, \gamma)\) as defined in Theorem 5.2 and assume that \[\begin{equation} | 1-\beta ^T\mathcal A^{-1}e| < 1. \tag{5.4} \end{equation}\] Then, the RKM is convergent of order \[\begin{equation} \min_{1\leq j \leq \nu}\{p, \kappa_j - \nu +2\}, \end{equation}\] where \(p\) is the order of the RKM when applied to ordinary differential equations.

*Proof.*See Kunkel/Mehrmann, Theorem 5.12.

Some remarks on the *stability condition* (5.4). As laid out in the proof of the theorem, the quantity \(\rho = 1-\beta ^T\mathcal Ae\) defines an amplification factor of the numerical errors. Accordingly, \(|\rho| < 1\) is one of the sufficient conditions for convergence. For \(\rho = 0\), utmost stability is reached which, as we will see below for the general nonlinear case, means that *index-1* (or *strangeness-free*) problems are integrated with the same order as ODEs.

For example, so-called *stiffly-implicit* schemes (we will consider them again at a later point in the lecture) come with the property that
\[
\alpha_{sj} = \beta_j, \quad j=1,2,\dotsc,s,
\]
i.e. the last row of the \(\mathcal A\) matrix equals the vector \(\beta\) or, in other terms,
\[
\beta^T = e_s^T\mathcal A, \quad e_s^T:=
\begin{bmatrix}
0& 0& \dots &1
\end{bmatrix},
\]
so that
\[
1-\beta^T\mathcal A^{-1}e= 1-e_s^T\mathcal A\mathcal A^{-1}e = 1-e_s^Te=1-1=0.
\]

If also \(\sum_{\ell=1}^s\alpha_{j\ell}=\gamma_j\) (which is the case for all RKM that treat the autonomous case \(\dot x=f(x)\) like the nonautonomous case \(\dot x=f(t,x)\)) and since, for every consistent RKM, one has that \(\sum_{j=1}^s\beta_j=1\) (cp. Butcher’s Theorem 5.1), we find that for such *stiffly accurate* RKM, necessarily
\[\begin{equation}
\gamma_s =\sum_{\ell=1}^s\alpha_{s\ell} = \sum_{j=1}^s\beta_j=1.
\end{equation}\]

This implies that condition b.) in Theorem 5.2 with \(j=1\) is fulfilled for any \(k\), as \[ \beta^T\mathcal A^{-1}\gamma^k = e_s^T\gamma^k = \gamma_s^k = 1 = \frac{k\,!}{k\,!}. \]

This means that \(\kappa_1=\infty\) and, thus, no order reduction for problems of index \(\nu=1\).

## 5.3 A Note on RKM for Time-Varying DAEs

For a general linear time-varying DAE
\[
E(t)\dot x = A(t)x +f(t),
\]
the application of *Implicit Euler* as a general Runge-Kutta method reads
\[
x_{i+1} = x_i + h\dot X_{i1}
\]
with the stage derivative \(\dot X_{i1}\) implicitly defined via
\[\begin{align}
E(t_{i+1})\dot X_{i1} &= A(t_{i+1})X_{i1} + f(t_{i+1}) \\
X_{i1} &= x_i + h \dot X_{i1},
\end{align}\]
which, with \(h \dot X_{i1}=X_{i1}-x_i\) and \(X_{i1}=x_{i+1}\) gives the system

\[ [E(t_{i+1})-hA(t_{i+1})]x_{i+1} = E(t_{i+1})x_i + hf(t_{i+1}). \] If we compare with the examples from the chapter on linear time-varying DAEs, we need to record that

- if \((E(t), A(t))\) is regular for all \(t\), then the RKM may return a unique solution despite the fact that there are infinitely many; cp. Example 4.1
- if \((E(t), A(t))\) is singular for all \(t\), then the RKM cannot determine an approximation despite the fact that the problem has a unique solution; cp. Example 4.2.