§ Bounding chains: uniformly sample colorings
We wish to uniformly sample k
colorings of a graph G with maximum degree
Δ. Hence, we require k≥Δ+1. To perform this sampling,
we use MCMC to sample from a markov chain whose states are k-colorings of G,
whose stationary distribution is the uniform distribution over valid colorings.
The issue with MCMC techniques is that we never know when our chain has reached
the stationary state. To ensure we receive uniformly distributed samples,
we built a "bounding chain" W which has the following properties:
- States of W cover states of X [ie, state space of W are subsets of the state space of X].
- W's convergence to a stationary state can be checked.
- Upon convergence of W, state of W = state of X.
We will in fact run the W chain, and prove that running the W chain
is equivalent to running a 'shadow' of the X chain, and that stationary
states of the W chain correspond to stationary sates of the X chain.
Let X be the chain whose states are valid k colorings of G. In one step
of the chain X, we choose a vertex v uniformly at random; we then choose a
color cv′ uniformly at random for v that makes it a proper coloring. The vertex
v is changed to this new color c′. This is a symmetric proposal distribution,
Hence this chain has the uniform distribution over k-colorings to be
the stationary state.
Sampling exactly from this chain is hard: construct an initial state X0
amounts to finding some valid k coloring which in itself might be
challenging. Worse, we do not know whether the chain X has reached a
stationary state or not.
§ Bounding Chain
We construct a new chain W (the bounding chain of X), whose states are sets of colors for
vertices in G. Formally, the states of W are functions Vert(G)→2C where
Vert(G) denotes the vertices of G; C the set of colors. The transition
will be to pick a vertex v uniformly at random. Then, pick a new set of
legal colors Cv′ for v, such that:
- It is guaranteed that if X were transitioning on v, the color cv′ that would be picked by X for v is a member of Cv′. [state is covered ]
- The size of the set Cv′ attempts to be smaller than the current set of colorings Cv. [convergence ]
We describe the transition function next. But first, we need an alternate
lens on the transitions of X that is amenable to massaging.
§ Equivalent description of the transitions of X:
- Choosing a color uniformly at random from the set of valid colors for a vertex.
- Choosing colors from C without replacement until we get a color that is a valid color.
We claim that (1) and (2) have the same probability distribution.
Abstracting slightly, we state:
- Probability of choosing an element t∈T uniformly from T⊆S. This has probability 1/∣T∣.
- Probability of choosing a particular element t∈T, by picking elements from S without replacement until we get some element in T.
(1) and (2) have the same probability distribution.
§ Proof by induction:
Process (2) in code is the following:
def process(S, T):
assert(issubset(T, S))
s = np.random.choice(S)
if s in T: return s
else:
Snext = S.remove(s); return process(Snext, T)
We claim that the probability that process(S, T) = t0
for a fixed t0
in T
is 1/∣T∣. We create a new function indicator
to express this:
def indicator(t0, S, T):
assert(t0 in T)
assert(issubset(T, S))
return t0 == process(S, T)
Let's push in t0 ==
into the definiton of process
after inling process
.
This gives us:
def indicator(t0, S, T):
assert(t0 in T)
assert(issubset(T, S))
s = np.random.choice(S)
if s in T:
return t0 == s
else:
Snext = S.remove(s); return process(Snext, T)
Now, we write down the recurrence for the probability that we are trying
to compute: P(t0,S,T) is the probability that indicator(t0, S, T)
returns
True
. Alternatively, it's the probability that process(S, T)
returns t0
.
P(t0, S, T) = prob. that process(S, T) = t0
P(t0, T, T) = T/|T| * 1/|T| = 1/|T| [base case]
P(t0, S, T) = 1/|S| + (1 - |T|/|S|) * P(t0, |S|-1, T) [induction]
We assume for induction that P(t0, |S|-1, T) = 1/|T|
. On substitution into [induction]
,
we get:
P(t0, S, T)
= 1/|S| + (1 - |T|/|S|) * P(t0, |S|-1, T) [induction]
= 1/|S| + (1 - |T|/|S|) * 1/|T|
= 1/|S| + (1/|T| - 1/|S|)s
= 1/|T|
Which is indeed the same probability as (1):
1. Choosing an element uniformly from a subset T = 1/|T|
.
§ Proof by program analysis, Version 1
def process(S, T):
s = np.random.choice(S)
if s in T: return s
else return process(S.remove(s), T)
Notice that the last return is tail-call. This program can be rewritten as:
def process(S, T):
while True:
s = np.random.choice(S)
if s in T: return s
S = S.remove(s)
As previously, we create an indicator
function and study its behaviour:
def indicator(t0, S, T):
assert(t0 in T)
assert(issubset(T, S))
while True:
s = np.random.choice(S)
if s in T: return t0 == s
S = S.remove(s)
We know that this programs only returns a value from the line:
-
if s in T: return t0 == s
We now compute P(process(S, T) == t0)
.
Whatever the return value of indicator
, we can assume that it occured within
the if
condition. We can use this to compute:
P(indicator(t0, S, T) = True)
= P(t0 == s | s in T) [program only returns in this case]
= 1/|T|
§ Proof by program analysis, Version 2
Alternatively, we can also analyze this as we did in the first proof,
using the rule:
def f():
return if cond1 then cond2 else cond3
will have probability:
P(cond1) * P(cond2|cond1) + P(not cond1) * P(cond3|not cond1)
Using this, we can analyze indicator
as:
P(indicator(t0, S, T) = True)
= P(s in T) * P(t0 == s |s in T) +
P(s not in T) * P(indicator(t0, S.remove(s), T) | s not in T)
= |T|/|S| * 1/|T| +
(|S|-|T|)/|S| * P(indicator(t0, S.remove(s), T))
= 1/|S| + (|S|-|T|)/|S| * 1/|T| [by induction]
= 1/|T|
§ Sampling using the above definition
Recall that State(X)≡V→C, State(W)≡V→2C.
Let x[ ],w[ ] be the states of some run in the markov chains.
We start by having x[0] to be any valid k-coloring of G, and w[0] to
be the state where all vertices have all possible colors; w[0]≡_↦C.
Clearly, x[0]∈w[0].
By induction we assume that x[n−1]∈w[n−1]. We must now calculate a
w[n],x[n] such that (1) x[n]∈w[n], (2) x[n]'s proposal is symmetric.
(3) w[n]'s proposal is symmetric.
§ Occluded set O
Define O⊆C (for occluded) be the set colors that might possibly
be blocked for v from our view of w
. Note that this is an
over-approxmation : that is, there may be colors that are not blocked for v
, which we
believe to be blocked from w
's point of view.
O≡{c∈C:(v,α)∈E,c∈w[n−1](α)}
§ Allowed set A
Define A⊂C (for allowed) to be C−O. Note that A is
an under-approxmation , since O
was an over-approximation . That is:
- Any color in
A
is definitely a legal color for v
in x[n]
. - There are colors which are legal for
v
in x[n]
that is not in A
.
§ S: the sequence of states for transition
Now, we pick elements of C in sequence till we get an element of A
.
call this sequence S.
We will at worst pick Δ+1 elements for S, since the max-degree
of the graph is Δ.
§ Transition
Let i be the first index in S where we get a color that is truly legal
for v in x[n]. Note that such an index will always exist: We pick
elements into S till we get an element in A, and elements of A are
always legal. However, there can be elements which are not in A that
are still legal for v in x[n], since A is an under-approximation.
- We assign x[n](v)=i. So,
x
only cares about S[:i]
. - We assign w[n](v)=A. So,
W
cares about the entire sequence.
By the lemma proven, we know that this process of picking colors C
in a sequence till we get a color that is legal for v at index i
is the same as picking uniformly at random from the set of colors that are legal for
v.
§ An example
For example, we could have:
X | p:2 --- q:4
W | p:{1, 2} --- q:{4, 5}
we are sampling q:
O = {1, 2}
A = {3, 4, 5}
S = [2, 1, 3]
X | p:1 -- q:2
W | p:{1, 2} -- q:{1, 2, 3}
If we analyze S = [2, 1, 3]
we notice that:
2: invalid for W(p:[1, 2]), invalid for X(p:2)
1: invalid for W, valid for X
3: valid for W, valid for X
So, what we are really sampling ix:
- A prefix sequence
SX = [1]
(for Xs transition) - A leftover sequence
SW = [2, 3]
(for Ws transition)
To transition X
, we can safely drop SW
. However, to transition W
correctly,
we generate more elements in the sequence, till we hit a "safe" element.
§ An optimisation on picking colors: why we need a blocking set
Define the set B⊆C (for blocked) which governs which values
x[n] surely cannot take from our view of w[n−1].
Note that B is an under-approximation . v
might have
more colors that are blocked than what w
sees.
B≡{c∈C:(v,α)∈E,w[n−1](α)={c}}
Rather than sampling colors from C
till we get an element of A
, we can
sample colors from C/B
. We know that the colors in B
can never be used
by X, since the colors in B
are those that we know are blocked for sure .
This is used in the theoretical analysis of the paper.
§ Termination
We terminate when W has "trapped" X. That is, ∣w[n](v)∣=1 forall v∈V.
In such a case, the states of W is equal to states of X. This is
a coalescence (as it occurs in coupling from the past). From the coupling
from the past theory, we know that we have reached a stationary state of A
when this happens.
§ Pseudocode
cs :: [C]; cs = ...
vs :: [V]; vs = ...
es :: [(V, V); es = ...
blocked :: (V -> [C]) -> (V -> [C])
blocked v2cs v0 = concat [v2cs w | (v, w) <- es, v == v0, length (v2cs w) == 1]
occluded :: (V -> [C]) -> (V -> [C])
occluded v2cs v0 = concat [v2cs w | (v, w) <- es, v == v0]
allowed :: (V -> [C]) -> (V -> [C])
allowed v2cs = cs \\ occluded v2cs
perturb :: (V -> [C]) -> Rand (V -> [C])
perturb v2cs = do $
randv <- uniform_rand vs
randc_for_v <- uniform_rand (allowed v2cs v)
return $ \v -> if v == randv then randc_for_v else v2cs v
terminated :: (V -> [C]) -> Bool
terminated v2cs = all [length (v2cs v) == 1 | v <- vs]
chain :: (V -> [C]) -> Rand [V -> [C]]
chain f = do
if terminated f
then [] else do
f;' <- perturb f; fs <- chain f'; return (f:fs)
sample :: (V -> [C]) -> Rand (V -> [C])
sample = last . chain
§ References