Skip to article frontmatterSkip to article content

Automatic Differentiation

Cornell University

Derivative

Gradients

Suppose I have a function ff from Rd\R^d to R\R. The gradient f\nabla f is is the vector of partial derivatives of the function. Mathematically, it is a function from Rd\R^d to Rd\R^d such that

(f(w))i=wif(w)=limδ0f(w+δei)f(w)δ\left(\nabla f(w) \right)_i = \frac{\partial}{\partial w_i} f(w) = \lim_{\delta \rightarrow 0} \frac{f(w + \delta e_i) - f(w)}{\delta}

Another equivalent definition is that f(w)T\nabla f(w)^T is the linear map such that for any uRdu \in \R^d

f(w)Tu=limδ0f(w+δu)f(w)δ\nabla f(w)^T u = \lim_{\delta \rightarrow 0} \frac{f(w + \delta u) - f(w)}{\delta}

More informally, we say it uniquely defines ww nearby w0w_0 in the following way:

f(w)f(w0)+(ww0)Tf(w0)f(w) \approx f(w_0) + (w - w_0)^T \nabla f(w_0)

Gradient Operator

Here we introduce something much more general.

For a function FF from one vector space UU to another vector space VV, the derivative of FF is a function DFDF which maps UU to L(U,V)\mathcal{L}(U,V), where L(U,V)\mathcal{L}(U,V) denotes the set of linear maps from UU to VV. This means DFDF takes in an element xUx \in U and returns the derivative of FF at this point xx. This derivative function DF(x)DF(x) will take in another directional vector ΔU\Delta \in U and outputs the directional derivative of FF at point xx in direction Δ\Delta.

The derivative is defined as the unique function such that for any xx and Δ\Delta in UU,

limα0F(x+αΔ)F(x)α=(DF(x))Δ\lim_{\alpha \rightarrow 0} \frac{F(x + \alpha \Delta) - F(x)}{\alpha} = (DF(x)) \Delta

As a special case, note for a function RnR\mathbb R^n \to \mathbb R, we can always write it in the form of bTxb^Tx. Therefore, if F:RnRF: \mathbb R^n \to \mathbb R, we can always write DF(x)ΔDF(x)\Delta in the form of bTxb^Tx.

Symbolic differentiation

  1. Write your function as a single mathematical expression.
  2. Apply the chain rule, product rule, ..., to differentiate that expression.
  3. Execute the expression as code.

Problems

Numerical Differentiation

Just take a small enough amount (like 1e-8) and use it as the infinitely small value ϵ\epsilon

Problems

Automatic Differentiation (Forward Mode)

Automatic Differentiation allows us to compute derivatives automatically without any overheads or loss of precision. There are two rough classes of methods: forward mode and reverse mode. We introduce the forward mode here.

It fixes one input variable xx over R\mathbb{R}. At each step of the computation, as we’re computing some value yy, also compute yx\frac{\partial y}{\partial x}. We can do this with a dual numbers approach: each number yy is replaced with a pair (y,yx)(y, \frac{\partial y}{\partial x}).

Demo

def to_dualnumber(x):
    if isinstance(x, DualNumber):
        return x
    elif isinstance(x, float):
        return DualNumber(x)
    elif isinstance(x, int):
        return DualNumber(float(x))
    else:
        raise Exception("couldn't convert {} to a dual number".format(x))

class DualNumber(object):
    def __init__(self, y, dydx=0.0):
        super().__init__()
        self.y = y
        self.dydx = dydx
        
    def __repr__(self):
        return "(y = {}, dydx = {})".format(self.y, self.dydx)

    # operator overloading
    def __add__(self, other):
        other = to_dualnumber(other)
        return DualNumber(self.y + other.y, self.dydx + other.dydx)
    def __sub__(self, other):
        other = to_dualnumber(other)
        return DualNumber(self.y - other.y, self.dydx - other.dydx)
    def __mul__(self, other):
        other = to_dualnumber(other)
        return DualNumber(self.y * other.y, self.dydx * other.y + self.y * other.dydx)
    def __truediv__(self, other):
        return DualNumber(self.y / other.y, self.dydx / other.y - self.y * other.dydx / (other.y * other.y))
    
    def __radd__(self, other):
        return to_dualnumber(other).__add__(self)
    def __rsub__(self, other):
        return to_dualnumber(other).__sub__(self)
    def __rmul__(self, other):
        return to_dualnumber(other).__mul__(self)
    def __rtruediv__(self, other):
        return to_dualnumber(other).__truediv__(self)
    
def forward_mode_diff(f, xv):
    """
    It computes the df/dx at x=xv
    f is a function that may use +,-,*,/ or other operators we have overloaded
    xv is where we want to calculate the derivative
    """
	# x is a variable that has value xv; dx/dx = 1.0
    x = DualNumber(xv, 1.0)
    # f(x) is a DualNumber, because x is a DualNumber and x goes through a bunch of operations like + or * in f, which are overloaded for DualNumber. x goes through these DualNumber specified operations so the result is also a DualNumber from which we can obtain the derivative directly. 
    return f(x).dydx
def f(x):
    return 2*x*x - 1
def dfdx(x):
    return 4*x
def numerical_derivative(f, x, eps = 1e-5):
    return (f(x+eps) - f(x-eps))/(2*eps)

print(dfdx(3.0)) # 12.0
print(numerical_derivative(f, 3.0)) # 12.000000000078613
print(forward_mode_diff(f, 3.0)) # 12.0

Benefits

Simple in-place operations, Easy to extend to compute higher-order derivatives

Problem

We can only differentiate with respect to one scalar input. It can get a bit complicated if we are given a vector.