DM825 - Introduction to Machine Learning
Sheet 7, Spring 2013 [pdf format]

Exercise 1 – Linear discriminants

Develop analytically the formulas of a generative algorithm with Gaussian likelihood for a k-way classification problem. In particular, estimate the model parameters.
Derive the explicit formula of the decision boundaries in the case of two predictor variables.
Implement the analysis in R using the data:
Iris <- data.frame(cbind(iris[,c(2,3)], Sp = rep(c("s","c","v"), rep(50,3)))) train <- sample(1:150, 75) table(Iris$Sp[train])

Plot the contour of the Gaussian distribution and linear discriminant

Compare your results with those of the lda function from the package MASS in R.

Deepening: read section 4.3.3 of B2 and inspect the outcome of lda when run on the full data with all 4 predictors, ie:

Iris <- data.frame(cbind(iris, Sp = rep(c("s","c","v"), rep(50,3)))) z <- lda(Sp ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, Iris, prior = c(1,1,1)/3, subset = train) # predict(z, Iris[-train, ])$class plot(z,dimen=1) plot(z,type="density",dimen=1) plot(z,dimen=2)

Exercise 2 – Naive Bayes

You decide to make a text classifier. To begin with you attempt to classify documents as either sport or politics. You decide to represent each document as a (row) vector of attributes describing the presence or absence of words.

→

= (goal, football, golf, defence, offence, wicket, office, strategy)

Training data from sport documents and from politics documents is represented below using a matrix in which each row represents a (row) vector of the 8 attributes.

x_politics=

⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎣

1	0	1	1	1	0	1	1
0	0	0	1	0	0	1	1
1	0	0	1	1	0	1	0
0	1	0	0	1	1	0	1
0	0	0	1	1	0	1	1
0	0	0	1	1	0	0	1

⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦

x_sport=

⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣

1	1	0	0	0	0	0	0
0	0	1	0	0	0	0	0
1	1	0	1	0	0	0	0
1	1	0	1	0	0	0	1
1	1	0	1	1	0	0	0
0	0	0	1	0	1	0	0
1	1	1	1	1	0	1	0

⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦

Using a Naive Bayes classifier, what is the probability that the document x^→ = (1, 0, 0, 1, 1, 1, 1, 0) is about politics?

DM825 - Introduction to Machine Learning Sheet 7, Spring 2013 [pdf format]

DM825 - Introduction to Machine Learning
Sheet 7, Spring 2013 [pdf format]