1.
Casey Deesel is a sports agent negotiating a contract for Titus Johnston, an athlete in the National Football League (NFL). An important aspect of any NFL contract is the amount of guaranteed money over the life of the contract. Casey has gathered data on 506 NFL athletes who have recently signed new contracts. Each observation (NFL athlete) includes values for percentage of his team’s plays that the athlete is on the field (SnapPercent), the number of awards an athlete has received recognizing on-field performance (Awards), the number of games the athlete has missed due to injury (GamesMissed), and millions of dollars of guaranteed money in the athlete’s most recent contract (Money, dependent variable).
Casey has trained a full regression tree on 304 observations and then used the validation set to prune the tree to obtain a best-pruned tree. The best-pruned tree (as applied to the 202 observations in the validation set) is:
(a)Titus Johnston’s variable values are: SnapPercent=94, Awards = 6, and GamesMissed =
3.
How much guaranteed money does the regression tree predict that a player with Titus Johnson’s profile should earn in his contract? If required, round your answers to two decimal places.The predicted result is $ million of guaranteed money. (b)Casey feels that Titus was denied an additional award in the past season due to some questionable voting by some sports media. If Titus had won this additional award, how much additional guaranteed money would the regression tree predict for Titus versus the prediction in part (a)?
An additional award would not change the amount of guaranteed money.
An additional award would increase the amount of guaranteed money by $8.91 million.
An additional award would increase the amount of guaranteed money by $13.99 million.
An additional award would increase the amount of guaranteed money by $17.79 million.
An additional award would increase the amount of guaranteed money by $26.02 million.
– Select your answer -I.II.III.IV.V. (c)As Casey reviews the best-pruned tree, he is confused by the leaf node corresponding to the sequence of decision rules of “SnapPercent ? 90.28, SnapPercent < 95.37, Awards < 6.75, GamesMissed < 1.5." This sequence of decision rules results in an estimate of $50 million of guaranteed money, but the tree states that zero observations occur in the corresponding partition. If zero observations occur in this partition, how can the regression tree provide an estimate of $50 million? Explain this part of the regression tree to Casey by referring to how the best-pruned tree is obtained. The predicted guaranteed money of $50 million for observations satisfying "SnapPercent ? 90.28, SnapPercent < 95.37, Awards < 6.75, GamesMissed < 1.5" is based on the average guaranteed money of the observations in the - Select your answer -trainingvalidation set that satisfy this sequence of decision rules. The best-pruned tree is obtained by - Select your answer -removing leaf nodes fromadding leaf nodes to the initial regression tree to obtain the tree with the - Select your answer -fewestgreatest leaf nodes while achieving the minimum classification error rate on the - Select your answer -trainingvalidation set. In this case, the - Select your answer -trainingvalidation set has zero observations that satisfy "SnapPercent ? 90.28, SnapPercent < 95.37, Awards < 6.75, GamesMissed < 1.5" which just means that this leaf node - Select your answer -does not contributecontributes to the classification error rate of this tree.
2.
Mary Jay is a salesperson for a cosmetics company that relies on direct marketing to sell its products. A classification method was developed to predict whether a customer will purchase if contacted with a targeted marketing pitch. This classification method generated output to create following decile-wise lift chart on a test set of 10,000 customers, 400 of whom actually purchased the product when solicited with targeted marketing.
Use the following information:
The height of the bar, that corresponds to the first decile, is 2.50.
The height of the bar, that corresponds to the second decile, is 2.45.
The height of the bar, that corresponds to the third decile, is 1.95.
(a)In the top 1,000 customers deemed most likely to purchase in response to direct marketing, how many actually made a purchase? purchasers (b)In the top 2,000 customers deemed most likely to purchase in response to direct marketing, how many actually made a purchase? purchasers
3.
The dating web site Oollama.com requires its users to create profiles based on a survey in which they rate their interest (on a scale from 0 to 3) in five categories: physical fitness, music, spirituality, education, and alcohol consumption. A new Oollama customer, Erin O’Shaughnessy, has reviewed the profiles of 40 prospective dates and classified whether she is interested in learning more about them.
Based on Erin’s classification of these 40 profiles, Oollama has applied a logistic regression to predict Erin’s interest in other profiles that she has not yet viewed. The resulting logistic regression model is as follows:
For the 40 profiles (observations) on which Erin classified her interest, this logistic regression model generates that following probability of Interested.
Probability ofProbability ofObservationInterestedInterestedObservationInterestedInterested3511.0001310.4122110.999200.2852910.999300.2192510.999700.1683910.999900.1682610.9901200.1682310.9811800.1683310.9742210.168100.8823110.1682410.882600.1282810.8822000.1283610.8821500.0291600.791500.0202710.7911400.0153010.7911900.0113210.791800.0083410.7911000.0013710.7911700.0014010.791400.0013800.7321100.000(a)Using a cutoff value of 0.5 to classify a profile observation as Interested or not, construct the confusion matrix for this 40-observation training set. PredictedActual0101 Compute sensitivity, specificity, and precision measures and interpret them within the context of Erin’s dating prospects. If required, round your answers to two decimal places. Do not round intermediate calculations.
The sensitivity of the model is . This suggests that the model is reasonably – Select your answer -goodbad at identifying the profiles that Erin is interested in.
The specificity of the model is . This suggests that the model is reasonably – Select your answer -goodbad at avoiding recommending profiles to Erin that she will not be interested in.
The precision of the model is . This suggests that the model is reasonably – Select your answer -goodbad at suggesting profiles of interest to Erin.
(b)Oollama understands that its clients have a limited amount of time for dating and therefore use decile-wise lift charts to evaluate their classification models. For the training data, what is the first decile lift resulting from the logistic regression model? Interpret this value.
The first decile lift of this classification is . It means that the first decile of the logistic regression model – Select your answer -halvesdoublestriplesdoes not change the number of profiles that Erin is interested in versus random selection.
(c)A recently posted profile has values of Fitness = 3, Music = 1, Education = 3, and Alcohol = 2. Use the estimated logistic regression equation to compute the probability of Erin’s interest in this profile. If required, round your answers to three decimal places. Do not round intermediate calculations.
Log odds =
Probability of Interest =
(d)Now that Oollama has trained a logistic regression model based on Erin’s initial evaluations of 40 profiles, what should its next steps be in the modeling process? Oollama should use their model to suggest profiles – Select your answer -of interestwith lack of interestof interest and with lack of interest to Erin in order to compute classification accuracy measures on a validation set.
4.
A university is applying classification methods in order to identify alumni who may be interested in donating money. The university has a database of 58,205 alumni profiles containing numerous variables. Of these 58,205 alumni, only 576 have donated in the past. The university has oversampled the data and trained a random forest of 100 classification trees. For a cutoff value of 0.5, the following confusion matrix summarizes the performance of the random forest on a validation set:
PredictedActualDonationNo DonationDonation26523No Donation5,32723,487
The following table lists some information on individual observations from the validation set:
Probability ofPredictedObservation IDActual ClassDonationClassADonation0.7DonationBDonation0.9DonationCDonation0.3No Donation(a)Choose the correct explanation for how the probability of Donation was computed for the three observations. (i)The probability of Donation for each observation is the proportion of the 100 individual classification trees that classified the observation as “Donation.”(ii)The probability of Donation for each observation is the proportion of the 100 individual classification trees that classified the observation as “No Donation.”(iii)The probability of Donation for each observation is the ratio of the individual classification trees that classified the observation as “Donation” and those that classified it as “No Donation.”(iv)The probability of Donation for each observation is the ratio of the individual classification trees that classified the observation as “No Donation” and those that classified it as “Donation.”- Select your answer -Option (i)Option (ii)Option (iii)Option (iv)Why were Observations A and B classified as Donation and Observation C was classified as No Donation? If required, round your answers to one decimal place.
The probability of Donation for Observation A is . It is – Select your answer -greater less than 0.5, so Observation A is classified as Donation by the random forest.
The probability of Donation for Observation B is . It is – Select your answer -greater less than 0.5, so Observation B is classified as Donation by the random forest.
The probability of Donation for Observation C is . It is – Select your answer -greater less than 0.5, so Observation C is classified as No Donation by the random forest.
(b)Compute the values of accuracy, sensitivity, specificity, and precision. Explain why accuracy is a misleading measure to consider in this case. Evaluate the performance of the random forest, particularly commenting on the precision measure. If required, round your answer to three decimal places.
Accuracy =
If required, round your answers to the nearest whole percentage. Accuracy is not the best measure to use for unbalanced data sets because less than % of the alumni in the data have donated. If required, round your answers for Sensitivity and Specificity to three decimal places and round your answer for Precision to four decimal places.
Sensitivity =
Expected Profit of Coupon Offer = P(coupon used) ×Profit if coupon used+ (1 –P(coupon used)) ×Profit if coupon not used
determine which customers should be sent the coupon.
CustomerProbability of Using Coupon10.4620.3530.2640.0350.02
Determine the expected profit for each customer. Round your answers to the nearest cent. Enter negative value as negative number, if any.
CustomerExpected Profit1$ 2$ 3$ 4$ 5$
The expected profit is positive for customers – Select your answer -2 and 34 and 51, 2, and 33, 4, and 51, 2, 3, and 41, 2, 3, and 52, 3, 4, and 51, 2, 3, 4, and 5, so these customers – Select your answer -should be offeredshould not be offered the coupon.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download