MATH Data Science and Statistical Learning
, multiple questions for the one HW
MATH 4323, Fall 2022, Homework # 3.
Instructions: Submit the solutions as a file (type it up and save as a .pdf or a W ordfile, no hand-written solutions) via UH Blackboard. Keep responses brief and to the point.
For code & output: include only pieces that are of utmost relevance to the question.
Conceptual.
1. This problem involves hyperplanes in two dimensions.
(a) Sketch the hyperplane X1 − X2 = 0. Indicate (by using either different colors or
symbols) the set of points for which X1 − X2 > 0, as well as the set of points for
which X1 − X2 < 0. To which class do the following points belong:
i. X1 = 1, X2 = 1,
ii. X1 = 0, X2 = −1,
iii. X1 = 0, X2 = 1.
(b) On a different plot, sketch the hyperplane X1 + X2 = 0. Indicate (by using
either different colors or symbols) the set of points for which X1 + X2 > 0, as
well as the set of points for which X1 + X2 < 0. To which class do the following
points belong:
i. X1 = 1, X2 = 1,
ii. X1 = −1, X2 = −1,
iii. X1 = 1, X2 = −1.
2. We have seen that in p = 2 dimensions, a linear decision boundary takes the form
β0 + β1 X1 + β2 X2 = 0. We now investigate a non-linear decision boundary.
(a) Sketch the curve
(1 + X1 )2 + (2 − X2 )2 = 4.
and
(1 + X1 )2 + (2 − X2 )2 = 16.
(b) On your sketch, indicate the set of points for which
(1 + X1 )2 + (2 − X2 )2 > 16,
16 ≥ (1 + X1 )2 + (2 − X2 )2 ≥ 4,
1
as well as the set of points for which
(1 + X1 )2 + (2 − X2 )2 < 4.
(c) Suppose that a classifier assigns an observation to the blue class if
16 ≥ (1 + X1 )2 + (2 − X2 )2 ≥ 4,
and to the red class otherwise. To what class is the observation (0, 0) classified?
(−1, 1)? (2, 2)? (3, 4)?
(d) Argue that while the decision boundary in (c) is not linear in terms of X1 and
X2 , it is linear in terms of X1 , X12 , X2 , and X22 .
3. Here we explore the maximal margin classifier on a toy data set.
(a) We are given n = 8 observations in p = 2 dimensions. For each observation,
there is an associated class label.
Obs.
1
2
3
4
5
6
7
8
X1
1
3
4
2
4
4
1
1
X2
4
4
3
2
2
4
2
3
Y
Blue
Red
Red
Blue
Red
Red
Blue
Blue
Sketch the observations.
(b) Sketch the optimal separating hyperplane, and provide the equation for this
hyperplane (of the form (9.1) from the book pdf).
(c) Describe the classification rule for the maximal margin classifier. It should be
something along the lines of ”Classify to Red if β0 + β1 X1 + β2 X2 > 0, and
classify to Blue otherwise.” Provide the values for β0 , β1 , and β2 .
(d) On your sketch, indicate the margin for the maximal margin hyperplane.
(e) Indicate the support vectors for the maximal margin classifier.
(f) Argue that a slight movement of the seventh observation would not affect the
maximal margin hyperplane.
(g) Sketch a hyperplane that is not the optimal separating hyperplane, and provide
the equation for this hyperplane.
(h) Draw an additional observation on the plot so that the two classes are no longer
separable by a hyperplane.
2
Applied.
4. Using the Boston data set of library M ASS, fit classification models in order to
predict whether a given suburb has a house value (medv) above or below the median
(median(medv)). In particular, explore various KNN models.
(a) Create a medv01 variable, where
(
0, medv ≤ median(medv),
medv01 =
1, medv > median(medv)
Add it as a column in Boston data frame, while disposing of the original medv
variable (via Boston$medv = N U LL).
(b) Decide on three values of K and three subsets of predictors (including full set of
all 13 predictors). Totally your judgment call. When picking predictor subsets,
you could use
• Your own logical considerations (which variables appear most important to
predict the price),
• Correlation matrix (to determine if there are groups of correlated variables,
and you could just retain one of them in the model, dropping the rest).
(c) What stable method was introduced in the class in order to compare predictive
quality of different models? Proceed to use this method & obtain test error
estimates of all 3 × 3 = 9 models. Which model (the combination of K &
predictor subset) won?
(d) Are all the variables in Boston data set on the same scale? If not, how do we
deal with it?
(e) Proceed to apply scaling to the predictors in Boston data, and repeat part (c)
for the standardized data set. Did the results (the test errors & the winning
model) change?
5. In this problem, you will use support vector approaches in order to predict whether a
given car gets high or low gas mileage based on the Auto data set.
(a) Create a binary variable that takes on a 1 for cars with gas mileage above the
median, and a 0 for cars with gas mileage below the median. Make sure to
dispose of original mpg value in the end (Auto$mpg = N U LL).
(b) Run set.seed(1). Fit a support vector classifier to the data with various values
of cost (similar to the lab), in order to predict whether a car gets high or low
gas mileage. Report the cross-validation errors associated with different values
of this parameter.
3
(c) Which cost value gave you the misclassification error? Print the confusion matrix
for the optimal cost from part (c). Describe the two types of errors your model
can make, and which of those two types does it commit more frequently?
(d) Plot the fitted boundary of the optimal support vector classifier for a couple of
predictor pairings.
Hint: In the lab, we used the plot() function for svm objects only in cases with
p = 2. When p > 2, you can use the plot() function to create plots displaying
pairs of variables at a time. Essentially, instead of typing
> plot(svmfit, dat)
where svmf it contains your fitted model and dat is a data frame containing your
data, you can type
> plot(svmfit, dat, x1~x4 )
in order to plot just the first and fourth variables. However, you must replace
x1 and x4 with the correct variable names. To find out more, type ?plot.svm.
6. This problem involves the OJ data set which is part of the ISLR package.
(a) Use set.seed(1). Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
(b) Fit a support vector classifier to the training data using cost = 0.01, with
P urchase as the response and the other variables as predictors. Use the summary()
function to produce summary statistics, and describe the results obtained.
(c) What are the training and test error rates?
(d) Use the tune() function to select an optimal cost. Consider values in the range
0.01 to 10 (close to how we did in the lab).
(e) Compute the training and test error rates using this new value for cost.
4
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download