§ Precision, Recall, and all that.
 Setting: we have some theorem goal $g$, a dataset of mathematical lemmas $D$, a set of actually useful lemmas $A$, and a set of predicted lemmas $P$.
 We want to provide a good measure of how "good" $P$is with respect to the ground truth $A$.
 We begin by defining precision , which is the fraction of $P$ that was correct: $P \cap A/P$. Probabilistically, this can be seen as
P(actualpredicted)
.  Simlarly, we define recall as the fraction of
A
that was correctly predicted: $P \cap A/A$. Probabilistically, this is P(predictedactual)
.  Let us now change the setting a little bit, where we swap the set $P$ for a \emph{sequence} over the full universe of lemmas $A$.
 We can get a set by truncating $P$ at some threshold. So we will define
precision@k
to be the precision of the set $P[:k]$.  We note that recall as a function of k,
recall@k
is nondecreasing. As we see more predictions, we can only get more actually useful things. See that recall has a fixed denominator (the size of the number of actually useful things).  Since
recall@k
is nondecreasing, we can build an inverse, k@recall(r)
. For a given level of recall r
, we map to the smallest $k$ (smallest number of items we need to take from the ranking) to get that level of recall.  This lets us define a precisionrecall function, where
precision@recall(r) = precision@k(k@recall(r))
.
§ Precision, Recall formulae
 Suppose for a given goal $g$, we have 3 correct premises
a, b, c
. The universe has premises a, b, c, x, y, z
. Our model predicts premises in the ranking x, a, y, b, c, z
. Then we summarize this as follows:
Rank  Val  Prec  Rec
1  x  0  0/3
2  a  1/2  1/3
3  y  1/3  1/3
4  b  2/4  2/3
5  c  3/4  3/3
5  z  3/5  3/3
 Note that recall is monotonically nondecreasing, while precisoin both increases (
0 > 1/2
) and decreases ( 3/4 > 3/5
).  We introduce an auxiliary function delta, $\delta(i) \equiv \text{lemma at i is correct}$. This lets us write the above quantities as follows:
 Let $s(n) \equiv \sum_{i=0}^n \delta(i)$.
 The total number of correct elements is $s(N)$ where $N$ is the total number of correct premises.
 The precision at $k$ is given by $p(k) \equiv s(k)/k$. The recall at $k$ is given by $r(k) \equiv s(k)/s(N)$.
 Now note that the discrete difference $dr(k) = r(k)  r(k1)$, which equals $(s(k)s(k1)/s(N)$, which is $\delta(k)/s(N)$.
§ Mean Average Precision
 The best the
precision@recall
function can be is a flat line with precision=1
for all levels of recall.  Deviation from this tells us how much worse our model is from the best model.
 So, let's compute the area under the
precision@recall
curve. This is going to be the average precision, $ap \equiv \int_{r=0}^1 p(r) dr$.  Recall that the "best" precision will always have
p(r) = 1
. Thus the theoretical maximum of this value we can have is $ap = \int_{r=0}^1 1 dr = 1$. This gives us a good scale, where $0 \leq ap \leq 1$.  We use recall as a way to "standardize" across the size of the dataset by the recall.
 Let's change of variables into $k$. So we want to change $r$ into $r(k)$.
 This will change the bounds of integration.
 The lower limit is given by $0 = r(l)$ which solves for $l = 0$.
 The upper lmit is $1 = r(u)$ which solves for $u = N$ (the size of the dataset).
 This also changes $dr$ to $dr(k)dk$.
 In the discrete case, we set $dk = 1$, and $dr(k)$ becomes $r(k)  r(k1)$. This is $\Sigma_{i=0}^k \delta(i)/s(N)  \Sigma_{i=0}^{k1} \delta(i))/s(N)$ which evaluates to $\delta(k)/s(N)$.
 This gives us the discrete calulation of $ap$ to be $ap \equiv \Sigma_{k=1}^N p(k) \delta(k)/s(N)$.
§ Mean Average precision at K.
 I believe this to be an incoherent concept; Recall that we chose to define average precision as the area under the precision recall curve . This is a sensible quantity, because it's a normalized value (recall is between $0$ and $1$). We got the expression in terms of $k$ via a change of variables . We did not start at $k$! We started from $r$ and got to $k$.
 We can try to hack the expression for $ap$ to artifically create $ap@K$. Let's try.
 First, to go from $k$ to $r$, we find a number $r_k$ such that the recall at $r(k) = r_k$
 Next, we calculate $ap@K \equiv \int_0^{r_K} p(r) dr$.
 We must now find we must find new lower and upper bounds in terms of $k$.
 The lower bounds needs us to find
0 = r(l)
or l = 0
.  The upper bound needs us to find
r_K = r(u)
, or u = K
.  We will next have to calculate $dr(k) dk$. Previously, we set $dk = 1$, and we calculated $dr(k) \equiv r(k)  r(k1)$. This will give us $dr(k) \equiv \delta(k)/s(N)$. But note that $s(N)$ will count all documents, not just limited to the topK. So let's used $s(K)$ instead  a hack!
 Combining these, we get the formula to be $ap@K \equiv \int_{0}^K p(r(k)) dr(k) = \Sigma_{k=0}^K p(k) \delta(k) / s(K)$.

ap@K
feels very unprincipled, and I don't feel that this carries mathematical weight.  Is there an alternative derivatio that sheds light on why this formula makes sense?
§ Rprecision
 Recall that the recall is a nondecreasing function of $k$. The precision can vary any way it wants with respect to $k$.
 We will try to find a point where precision equals recall.
 Consider the equation $p(K) = r(K)$. Using our previous formulation, this reduces to $s(K)/K = s(K)/s(N)$. This of course gives us $K = s(N)$.
 So, at the index $K$ which is equal to the total number of correct lemmas, we will have the precision equal the recall.
 This value is called as the R precision: the precision $p(K)$ at the first index $K$ such that $r(K) = 1$.
 Empirically, this value $R$ correlates well with mean average precision.
§ $F_1$ Score
 A sensible derivation is offered by Van Rijsbergen in his PhD thesis.
 First, a quick and dirty definition: $F_1$ is the harmonic mean of precision and recall.
 This gives us
2/(1/p + 1/r)
, which works out to 2/[(tp + fp)/tp + (tp + fn)/tp]
.  This simplifies to
2tp/(2tp + fp + fn)
.  Now consider the quantity
E := 1  F
. Think of E
as error
. This works out to (fp + fn)/(2tp + fp + fn)
.  See that this quantity works out the symmetric difference of the $A$actual and $P$redicted set divided by the sum of the sizes of the $A$ctual and $P$redicted set! $A \Delta P \equiv fp + fn$, and $A = tp + fn$, and $P = tp + fp$.
 Thus, the full expression for $E$ becomes $E \equiv (A \Delta P) / [A + P]$ which is a genuinely sensible quantity!