Automatic Differentiation
Derivative¶
Gradients¶
Suppose I have a function from to . The gradient is is the vector of partial derivatives of the function. Mathematically, it is a function from to such that
Another equivalent definition is that is the linear map such that for any
More informally, we say it uniquely defines nearby in the following way:
Gradient Operator¶
Here we introduce something much more general.
For a function from one vector space to another vector space , the derivative of is a function which maps to , where denotes the set of linear maps from to . This means takes in an element and returns the derivative of at this point . This derivative function will take in another directional vector and outputs the directional derivative of at point in direction .
The derivative is defined as the unique function such that for any and in ,
As a special case, note for a function , we can always write it in the form of . Therefore, if , we can always write in the form of .
Symbolic differentiation¶
- Write your function as a single mathematical expression.
- Apply the chain rule, product rule, ..., to differentiate that expression.
- Execute the expression as code.
Problems¶
- Converting code into a mathematical expression is not trivial. We need humans to do it.
- The differentiation can be very large and complicated, especially when it gets to chain rules.
Numerical Differentiation¶
Just take a small enough amount (like 1e-8) and use it as the infinitely small value
Problems¶
- suffers from numerical imprecision
- can have problems if the function we’re differentiating is not smooth
- we aren’t sure what values to use for
Automatic Differentiation (Forward Mode)¶
Automatic Differentiation allows us to compute derivatives automatically without any overheads or loss of precision. There are two rough classes of methods: forward mode and reverse mode. We introduce the forward mode here.
It fixes one input variable over . At each step of the computation, as we’re computing some value , also compute . We can do this with a dual numbers approach: each number is replaced with a pair .
Demo¶
def to_dualnumber(x):
if isinstance(x, DualNumber):
return x
elif isinstance(x, float):
return DualNumber(x)
elif isinstance(x, int):
return DualNumber(float(x))
else:
raise Exception("couldn't convert {} to a dual number".format(x))
class DualNumber(object):
def __init__(self, y, dydx=0.0):
super().__init__()
self.y = y
self.dydx = dydx
def __repr__(self):
return "(y = {}, dydx = {})".format(self.y, self.dydx)
# operator overloading
def __add__(self, other):
other = to_dualnumber(other)
return DualNumber(self.y + other.y, self.dydx + other.dydx)
def __sub__(self, other):
other = to_dualnumber(other)
return DualNumber(self.y - other.y, self.dydx - other.dydx)
def __mul__(self, other):
other = to_dualnumber(other)
return DualNumber(self.y * other.y, self.dydx * other.y + self.y * other.dydx)
def __truediv__(self, other):
return DualNumber(self.y / other.y, self.dydx / other.y - self.y * other.dydx / (other.y * other.y))
def __radd__(self, other):
return to_dualnumber(other).__add__(self)
def __rsub__(self, other):
return to_dualnumber(other).__sub__(self)
def __rmul__(self, other):
return to_dualnumber(other).__mul__(self)
def __rtruediv__(self, other):
return to_dualnumber(other).__truediv__(self)
def forward_mode_diff(f, xv):
"""
It computes the df/dx at x=xv
f is a function that may use +,-,*,/ or other operators we have overloaded
xv is where we want to calculate the derivative
"""
# x is a variable that has value xv; dx/dx = 1.0
x = DualNumber(xv, 1.0)
# f(x) is a DualNumber, because x is a DualNumber and x goes through a bunch of operations like + or * in f, which are overloaded for DualNumber. x goes through these DualNumber specified operations so the result is also a DualNumber from which we can obtain the derivative directly.
return f(x).dydx
def f(x):
return 2*x*x - 1
def dfdx(x):
return 4*x
def numerical_derivative(f, x, eps = 1e-5):
return (f(x+eps) - f(x-eps))/(2*eps)
print(dfdx(3.0)) # 12.0
print(numerical_derivative(f, 3.0)) # 12.000000000078613
print(forward_mode_diff(f, 3.0)) # 12.0
Benefits¶
Simple in-place operations, Easy to extend to compute higher-order derivatives
Problem¶
We can only differentiate with respect to one scalar input. It can get a bit complicated if we are given a vector.