DM825  Introduction to Machine Learning
Sheet 11, Spring 2013 [pdf format]

Exercise 1 – Probability theory
Prove the following rule:
p(x_{i}x_{−i})=  p(x_{1},…,x_{N}) 

∫ 
p(x_{1},…,x_{N}) dx_{i} 


where x_{−i}={x_{1},…,x_{N}} ∖ x_{i}.
Exercise 2 – Naive Bayes
Consider the binary classification problem of spam email in which a
binary label Y ∈ {0, 1} is to be predicted from a feature vector
X = (X_{1}, X_{2}, …, X_{n}), where X_{i}=1 if the word i is present
in the email and 0 otherwise. Consider a naive Bayes model, in which
the components X_{i} are assumed mutually conditionally independent
given the class label Y.
 [a] Draw a directed graphical model corresponding to the naive
Bayes model.
 Find a mathematical expression for the posterior
class probability p(Y = 1  x), in terms of the prior class
probability p(Y = 1) and the classconditional densities p(x_{i} 
y).
 Make now explicit the hyperparameters of the Bernoulli
distributions for Y and X_{i}. Call them, µ and θ_{i},
respectively. Assume a beta distribution for the prior of these
hyperparameters and show how to learn the hyperparameters from a set
of training data (y^{j},x^{→j})_{j=1}^{m} using a Bayesian
approach. Compare this solution with the one developed in class via
maximum likelihood.
Exercise 3 – Directed Graphical Models
Consider the graph in Figure left.
 Write down the standard factorization for the given
graph.
 For what pairs (i, j) does the statement X_{i} is independent of
X_{j} hold? (Don’t assume any conditioning in this part.)
 Suppose that we condition on {X_{2}, X_{9}}, shown shaded in the
graph. What is the largest set A for which the statement X_{1} is
conditionally independent of X_{A} given {X_{2}, X_{9}} holds?
 What is the largest set B for which X_{8} is conditionally
independent of X_{B} given {X_{2}, X_{9}} holds?
 Suppose that I wanted to draw a sample from the marginal
distribution p(x_{5}) = Pr[X_{5} = x_{5}]. (Don’t assume that X_{2} and
X_{9} are observed.) Describe an efficient algorithm to do so without
actually computing the marginal.
Figure 1: A directed graph. 