DM825 - Introduction to Machine Learning
Sheet 11, Spring 2013 [pdf format]



Exercise 1 – Probability theory

Prove the following rule:

p(xi|xi)=
p(x1,…,xN)
p(x1,…,xNdxi

where xi={x1,…,xN} ∖ xi.



Exercise 2 – Naive Bayes

Consider the binary classification problem of spam email in which a binary label Y ∈ {0, 1} is to be predicted from a feature vector X = (X1, X2, …, Xn), where Xi=1 if the word i is present in the email and 0 otherwise. Consider a naive Bayes model, in which the components Xi are assumed mutually conditionally independent given the class label Y.


  1. [a] Draw a directed graphical model corresponding to the naive Bayes model.
  2. Find a mathematical expression for the posterior class probability p(Y = 1 | x), in terms of the prior class probability p(Y = 1) and the class-conditional densities p(xi | y).
  3. Make now explicit the hyperparameters of the Bernoulli distributions for Y and Xi. Call them, µ and θi, respectively. Assume a beta distribution for the prior of these hyperparameters and show how to learn the hyperparameters from a set of training data (yj,xj)j=1m using a Bayesian approach. Compare this solution with the one developed in class via maximum likelihood.



Exercise 3 – Directed Graphical Models

Consider the graph in Figure left.


Figure 1: A directed graph.