§ Differentiating through sampling from a random normal distribution
- Credits to [Edward Eriksson) for teaching me this.
- The key idea is that since we can write the normal distribution with parameters mean μ and variance σ as a function of the standard normal distribution. We then get to believe that the standard
- y=f(σz) where z∼N(0,1).
- Then, by treating z as a constant, we see that dy/dσ=f′(σz)⋅z by chain rule.
- That is, we treat z as "constant", and minimize the σ.
- My belief in this remains open until I can read a textbook, but I have it on good authority that this is correct.
- How does this relate to the VAE optimisation? It's the same trick, where we claim that sample(N(0,1)) can be held constant during backprop, as if the internal structure of the samplefunction did not matter. Amazing.
import numpy as np
sigma = 1.0
def f (x): return np.sin(x + 0.1)
def fprime(x): return np.cos(x + 0.1)
for i in range(1000):
z = np.random.normal(0, 1)
sz = sigma * z
fx = f(sz)
gradfx = fprime(sz)
dsigma = gradfx * z
print("z = %5.2f | f = %6.2f | df = %6.2f | sigma = %6.2f | dsigma = %6.2f" %
(z, fx, gradfx, sigma, dsigma))
sigma = sigma - 0.01 * dsigma