Tape Management

ADOL-C (and therefore ADOLC.jl) leverages a tape for its derivative calculations except for its tape-less forward-mode, which is applied to compute Jacobians for certain inputs. The tape's task is to save information of all operations in which ADOL-C's differentiable types are involved, and the connection to other operations. Thus, the tape represents a variant of the computational graph of the user-defined function f at a given point x. The stored information is used for univariate Taylor polynomial propagation, for computing higher-order derivatives, and for the application of the forward- and reverse-mode in first- and second-order calculations.

Reuse of the Tape

Since the tape stores only the calling sequence for the first given input point x, a derivative computation based on the tape could lead to wrong results for other input points. However, the tape's construction is expensive, making it beneficial to create the tape once and recreate it only if the computational graph of f changes. Those changes in the computational graph of the f might occur if, for example, the function is composed of several if-statements. In most cases, ADOL-C stops the program with an error whenever the tape has to be recreated for a new input. In ADOLC.jl, the derivative (derivative!) driver builds the tape automatically in every call. Users can set the reuse_tape flag to suppress this build process for saving the computational effort of the tape creation. To reuse an existing tape, set the flag reuse_tape to true and pass the tape_id of the existing tape to derivative. As demonstrated in the following example, the application is straightforward.

f(x) = 1x[1]*x[2]^2 - x[3]^3
x = [-1.5, 1.0, -1.5]
dir = [0.0, 0.0, -1.0]
mode = :hess_vec
tape_id = 0
num_iters = 99
res0 = derivative(f, x, mode, dir=dir, tape_id=tape_id)
# update x ....
for i in 1:num_iters
    res = derivative(f, x, mode, dir=dir, tape_id=tape_id, reuse_tape=true)
    # do computations ....
end

Marking Independent and Dependent Variables

If not all derivatives are required, i.e., only the derivatives of subset of the output with respect to a subset of the input variables is needed, ADOLC.jl provides suitable options. To tape a certain variables for differentiation, the create_independent method should be used. For differentiable output variables use the dependent method. In the following example we compute the derivative of the $(1, 1)$ entry of the two cholesky factorization matrizes with respect to a single variable, while the remaining entries are non-differentiable. At first, the problem setup is defined.

using LinearAlgebra

n = 3
x = 1.0
A = ones(n, n)
A = A + diagm(0 => Adouble{TbAlloc}([10, 10, 10]))

tape_id = 1

The type Adouble{TbAlloc} wraps the tape-based differentiable type of ADOL-C. Using the constructor of the type as above however, creates an instance of Matrix{Adouble{TbAlloc}} with non-differentiable entries. This is only used to obtain the correct type of the matrix.
Afterwards, the tape is generated by ADOLC.trace_on. Then, all calls to create_independent and dependent will write the corresponding information on the active tape. As stated above,create_independent generates differentiable Adouble{TbAlloc}s. In our example, we mark x as independent. By multiplying A[1, 1] with adoub and storing it again in A[1, 1] the entry gets differentiable with respect to the independent variable adoub. This dependence is further passed through the cholesky factorization.

ADOLC.trace_on(tape_id)
adoub = create_independent(x)
A[1, 1] = A[1, 1] * adoub
fact = cholesky(A)

LinearAlgebra.Cholesky{Adouble{ADOLC.TbadoubleModule.TbadoubleCxxAllocated}, Matrix{Adouble{ADOLC.TbadoubleModule.TbadoubleCxxAllocated}}}
U factor:
3×3 LinearAlgebra.UpperTriangular{Adouble{ADOLC.TbadoubleModule.TbadoubleCxxAllocated}, Matrix{Adouble{ADOLC.TbadoubleModule.TbadoubleCxxAllocated}}}:
 Adouble{TbadoubleCxxAllocated}(TbadoubleCxxAllocated(Ptr{Nothing} @0x0000000002be83c0))  …  Adouble{TbadoubleCxxAllocated}(TbadoubleCxxAllocated(Ptr{Nothing} @0x000000000278e8f0))
                  ⋅                                                                          Adouble{TbadoubleCxxAllocated}(TbadoubleCxxAllocated(Ptr{Nothing} @0x0000000003c36e50))
                  ⋅                                                                          Adouble{TbadoubleCxxAllocated}(TbadoubleCxxAllocated(Ptr{Nothing} @0x0000000003ca3b20))

We then close the tape with ADOLC.trace_off() and the $(1, 1)$ entries of the factors and, for verification reasons, the $(1, 1)$ entry of A are marked as dependent variables. Therefore, fact.U[1, 1], fact.L[1, 1] and A[1, 1] are differentiable with respect to x.

dependent(fact.L[1, 1])
dependent(fact.U[1, 1])
dependent(A[1, 1])
ADOLC.trace_off()

To compute the derivatives, the fos_forward method of ADOL-C is leveraged:

result = CxxVector(n)
jac = CxxVector(n)
dir = CxxVector([1.0])

ADOLC.TbadoubleModule.fos_forward(tape_id, 3, 1, 0, [x], dir.data, result.data, jac.data)
jac

3-element CxxVector:
  1.6583123951777
  1.6583123951777
 11.0

We check our result to see if everything ran as expected.

using Test
@test jac[1] * fact.U[1, 1] + jac[2] * fact.L[1, 1] == jac[3]

Test Passed

It is also possible to mark certain inputs as parameters, an example can be found in the docs of derivative or derivative!.

Derivatives of Mutating Functions

Often it is more performant to leverage mutating functions, which stores the results in a pre-allocated container. In ADOLC.jl derivatives of such methods are computable by generating a tape with the methods of the example above. The example below solves the linear equation system $Ax=b$ leveraging ldiv!, which stores the result in $b$. We aim to compute the derivative of $A^{-1}b$ with respect to $b$. We select $A = \left(\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}\right)$ and $b=[1.0, 3.0]$. The Jacobian is given by $A$ itself and $x=[3.0, 1.0]$. ADOLC.jl can compute the Jacobian easly using the previously introduced methods. The function zos_forward is used to store all computational values on the tape required for the reverse-mode call. The call to fov_reverse computes the matrix-Jacobian product $W^TDf$, where $W$ is given by the weights and Df denotes the Jacobian.

using LinearAlgebra

b = [1.0, 3.0]
A = [[0.0, 1.0] [1.0, 0.0]]

tape_id = 1
num_dep = 2
num_indep = 2

ADOLC.TbadoubleModule.trace_on(tape_id)
adoubs = ADOLC.create_independent(b)
fact = lu(A)
ldiv!(adoubs, fact, adoubs)
ADOLC.dependent(adoubs)
ADOLC.TbadoubleModule.trace_off()

result = CxxVector(num_dep)
jac = CxxMatrix(num_dep, num_indep)
weights = CxxMatrix([[1.0, 0.0] [0.0, 1.0]])

ADOLC.TbadoubleModule.zos_forward(tape_id, num_dep, num_indep, 1, b, result.data)
ADOLC.TbadoubleModule.fov_reverse(tape_id, num_dep, num_indep, num_dep, weights.data, jac.data)
result, jac

# output

([3.0, 1.0], [0.0 1.0; 1.0 0.0])