§ Bounding chains: uniformly sample colorings

We wish to uniformly sample k colorings of a graph GG with maximum degree Δ\Delta. Hence, we require kΔ+1k \geq \Delta + 1. To perform this sampling, we use MCMC to sample from a markov chain whose states are kk-colorings of GG, whose stationary distribution is the uniform distribution over valid colorings. The issue with MCMC techniques is that we never know when our chain has reached the stationary state. To ensure we receive uniformly distributed samples, we built a "bounding chain" WW which has the following properties:
  • States of WW cover states of XX [ie, state space of WW are subsets of the state space of XX].
  • WW's convergence to a stationary state can be checked.
  • Upon convergence of WW, state of WW = state of XX.
We will in fact run the WW chain, and prove that running the WW chain is equivalent to running a 'shadow' of the XX chain, and that stationary states of the WW chain correspond to stationary sates of the XX chain. Let XX be the chain whose states are valid kk colorings of GG. In one step of the chain XX, we choose a vertex vv uniformly at random; we then choose a color cvc'_v uniformly at random for vv that makes it a proper coloring. The vertex vv is changed to this new color cc'. This is a symmetric proposal distribution, Hence this chain has the uniform distribution over kk-colorings to be the stationary state. Sampling exactly from this chain is hard: construct an initial state X0X_0 amounts to finding some valid kk coloring which in itself might be challenging. Worse, we do not know whether the chain XX has reached a stationary state or not.

§ Bounding Chain

We construct a new chain WW (the bounding chain of XX), whose states are sets of colors for vertices in GG. Formally, the states of WW are functions Vert(G)2CVert(G) \rightarrow 2^C where Vert(G)Vert(G) denotes the vertices of GG; CC the set of colors. The transition will be to pick a vertex vv uniformly at random. Then, pick a new set of legal colors CvC'_v for vv, such that:
  • It is guaranteed that if XX were transitioning on vv, the color cvc'_v that would be picked by XX for vv is a member of CvC'_v. [state is covered ]
  • The size of the set CvC'_v attempts to be smaller than the current set of colorings CvC_v. [convergence ]
We describe the transition function next. But first, we need an alternate lens on the transitions of XX that is amenable to massaging.

§ Equivalent description of the transitions of XX:

  1. Choosing a color uniformly at random from the set of valid colors for a vertex.
  2. Choosing colors from CC without replacement until we get a color that is a valid color.
We claim that (1) and (2) have the same probability distribution. Abstracting slightly, we state:
  1. Probability of choosing an element tTt \in T uniformly from TST \subseteq S. This has probability 1/T1/|T|.
  2. Probability of choosing a particular element tTt \in T, by picking elements from SS without replacement until we get some element in TT.
(1) and (2) have the same probability distribution.

§ Proof by induction:

Process (2) in code is the following:
def process(S, T):
  assert(issubset(T, S)) # precondition
  s = np.random.choice(S) # uniform choice over S.
  if s in T: return s # |T|/|S| prob. to enter `if`.
  else:
    # (1 - |T|/|S|) to enter `else`
    Snext = S.remove(s); return process(Snext, T)
We claim that the probability that process(S, T) = t0 for a fixed t0 in T is 1/T1/|T|. We create a new function indicator to express this:
def indicator(t0, S, T):
  assert(t0 in T) # precondition
  assert(issubset(T, S)) # precondition

  return t0 == process(S, T)
Let's push in t0 == into the definiton of process after inling process. This gives us:
def indicator(t0, S, T):
  assert(t0 in T) # precondition
  assert(issubset(T, S)) # precondition

  s = np.random.choice(S)
  if s in T:
    # T/S prob. for x in T
    return t0 == s # 1/|T| prob. for t0 == x
  else:
    #  (1 - |T|/|S|) to reach else branch.
    Snext = S.remove(s); return process(Snext, T)

Now, we write down the recurrence for the probability that we are trying to compute: P(t0,S,T)P(t0, S, T) is the probability that indicator(t0, S, T) returns True. Alternatively, it's the probability that process(S, T) returns t0.
P(t0, S, T) = prob. that process(S, T) = t0
P(t0, T, T) = T/|T| * 1/|T| = 1/|T| [base case]
P(t0, S, T) = 1/|S| + (1 - |T|/|S|) * P(t0, |S|-1, T) [induction]
We assume for induction that P(t0, |S|-1, T) = 1/|T|. On substitution into [induction], we get:
P(t0, S, T)
= 1/|S| + (1 - |T|/|S|) * P(t0, |S|-1, T) [induction]
= 1/|S| + (1 - |T|/|S|) * 1/|T|
= 1/|S| + (1/|T| - 1/|S|)s
= 1/|T|
Which is indeed the same probability as (1):
1. Choosing an element uniformly from a subset TT = 1/|T|.

§ Proof by program analysis, Version 1

def process(S, T):
  s = np.random.choice(S)
  if s in T: return s
  else return process(S.remove(s), T)
Notice that the last return is tail-call. This program can be rewritten as:
def process(S, T):
  while True:
    s = np.random.choice(S)
    if s in T: return s
    S = S.remove(s)
As previously, we create an indicator function and study its behaviour:
def indicator(t0, S, T):
  assert(t0 in T) # precondition
  assert(issubset(T, S)) # precondition

  while True:
    s = np.random.choice(S)
    if s in T: return t0 == s
    S = S.remove(s)
We know that this programs only returns a value from the line:
  • if s in T: return t0 == s
We now compute P(process(S, T) == t0). Whatever the return value of indicator, we can assume that it occured within the if condition. We can use this to compute:
P(indicator(t0, S, T) = True)
 = P(t0 == s | s in T) [program only returns in this case]
 = 1/|T|

§ Proof by program analysis, Version 2

Alternatively, we can also analyze this as we did in the first proof, using the rule:
def f():
  return if cond1 then cond2 else cond3
will have probability:
P(cond1) * P(cond2|cond1) + P(not cond1) * P(cond3|not cond1)
Using this, we can analyze indicator as:
P(indicator(t0, S, T) = True)
 = P(s in T) * P(t0 == s |s in T) +
    P(s not in T) * P(indicator(t0, S.remove(s), T) | s not in T)
 = |T|/|S| * 1/|T| +
    (|S|-|T|)/|S| * P(indicator(t0, S.remove(s), T))
 = 1/|S| + (|S|-|T|)/|S| * 1/|T| [by induction]
 = 1/|T|

§ Sampling using the above definition

Recall that State(X)VCState(X) \equiv V \rightarrow C, State(W)V2CState(W) \equiv V \rightarrow 2^C. Let x[ ],w[ ]x[~], w[~] be the states of some run in the markov chains. We start by having x[0]x[0] to be any valid k-coloring of GG, and w[0]w[0] to be the state where all vertices have all possible colors; w[0]_Cw[0] \equiv \_ \mapsto C. Clearly, x[0]w[0]x[0] \in w[0]. By induction we assume that x[n1]w[n1]x[n-1] \in w[n-1]. We must now calculate a w[n],x[n]w[n], x[n] such that (1) x[n]w[n]x[n] \in w[n], (2) x[n]x[n]'s proposal is symmetric. (3) w[n]w[n]'s proposal is symmetric.

§ Occluded set OO

Define OCO \subseteq C (for occluded) be the set colors that might possibly be blocked for vv from our view of w. Note that this is an over-approxmation : that is, there may be colors that are not blocked for v, which we believe to be blocked from w's point of view.
O{cC:(v,α)E,cw[n1](α)}O \equiv \{ c \in C : (v, \alpha) \in E, c \in w[n-1](\alpha) \}

§ Allowed set AA

Define ACA \subset C (for allowed) to be COC - O. Note that AA is an under-approxmation , since O was an over-approximation . That is:
  • Any color in A is definitely a legal color for v in x[n].
  • There are colors which are legal for v in x[n] that is not in A.

§ S: the sequence of states for transition

Now, we pick elements of CC in sequence till we get an element of A. call this sequence SS. We will at worst pick Δ+1\Delta + 1 elements for SS, since the max-degree of the graph is Δ\Delta.

§ Transition

Let ii be the first index in SS where we get a color that is truly legal for vv in x[n]x[n]. Note that such an index will always exist: We pick elements into SS till we get an element in AA, and elements of AA are always legal. However, there can be elements which are not in AA that are still legal for vv in x[n]x[n], since AA is an under-approximation.
  • We assign x[n](v)=ix[n](v) = i. So, x only cares about S[:i].
  • We assign w[n](v)=Aw[n](v) = A. So, W cares about the entire sequence.
By the lemma proven, we know that this process of picking colors C in a sequence till we get a color that is legal for vv at index ii is the same as picking uniformly at random from the set of colors that are legal for vv.

§ An example

For example, we could have:
X | p:2 --- q:4
W | p:{1, 2} --- q:{4, 5}

we are sampling q:

O = {1, 2}
A = {3, 4, 5}
S = [2, 1, 3]

X | p:1 -- q:2
W | p:{1, 2} -- q:{1, 2, 3}
If we analyze S = [2, 1, 3] we notice that:
2: invalid for W(p:[1, 2]), invalid for X(p:2)
1: invalid for W, valid for X
3: valid for W, valid for X
So, what we are really sampling ix:
  • A prefix sequence SX = [1] (for Xs transition)
  • A leftover sequence SW = [2, 3] (for Ws transition)
To transition X, we can safely drop SW. However, to transition W correctly, we generate more elements in the sequence, till we hit a "safe" element.

§ An optimisation on picking colors: why we need a blocking set

Define the set BCB \subseteq C (for blocked) which governs which values x[n]x[n] surely cannot take from our view of w[n1]w[n-1]. Note that BB is an under-approximation . v might have more colors that are blocked than what w sees.
B{cC:(v,α)E,w[n1](α)={c}}B \equiv \{ c \in C : (v, \alpha) \in E, w[n-1](\alpha) = \{c\} \}
Rather than sampling colors from C till we get an element of A, we can sample colors from C/B. We know that the colors in B can never be used by XX, since the colors in B are those that we know are blocked for sure . This is used in the theoretical analysis of the paper.

§ Termination

We terminate when WW has "trapped" XX. That is, w[n](v)=1|w[n](v)| = 1 forall vVv \in V. In such a case, the states of WW is equal to states of XX. This is a coalescence (as it occurs in coupling from the past). From the coupling from the past theory, we know that we have reached a stationary state of AA when this happens.

§ Pseudocode

-- | colors, vertices, edges
cs :: [C]; cs = ...
vs :: [V]; vs = ...
es :: [(V, V); es = ...


-- | definitely not allowed
blocked :: (V -> [C]) -> (V -> [C])
blocked v2cs v0 = concat [v2cs w | (v, w) <- es, v == v0, length (v2cs w) == 1]


-- | maybe not allowed
occluded :: (V -> [C]) -> (V -> [C])
occluded v2cs v0 = concat [v2cs w | (v, w) <- es, v == v0]

-- | definitely allowed from Ws point of view
allowed :: (V -> [C]) -> (V -> [C])
allowed v2cs = cs \\ occluded v2cs

-- | perturb the color function to another function
perturb :: (V -> [C]) -> Rand (V -> [C])
perturb v2cs = do $
  randv <- uniform_rand vs
  randc_for_v <- uniform_rand (allowed v2cs v)
  return $ \v -> if v == randv then randc_for_v else v2cs v


-- | check if we are done
terminated :: (V -> [C]) -> Bool
terminated v2cs = all [length (v2cs v) == 1 | v <- vs]

-- | generate chain
chain :: (V -> [C]) -> Rand [V -> [C]]
chain f = do
  if terminated f
  then [] else do
    f;' <- perturb f; fs <- chain f'; return (f:fs)

-- | return a sample
sample :: (V -> [C]) -> Rand (V -> [C])
sample = last . chain

§ References