Question 1 Physicians are recommending more exercise for patients, especially those who are overweight. One benefit of regular exercise is thought to be a reduction of bad cholesterol. To study the relationship, a doctor selected a sample of patients who did not do regular exercise, and measured their cholesterol level. She then started the patients on a program of exercise, and asked them to record the number of minutes per week that they exercised. After 4 months, she re-measured their cholesterol levels. The data are contained in the file Cholesterol.xls. Plot the data. Does it appear that amount of exercise and cholesterol level change is related? The minutes of exercise and the level of cholesterol have a positive lineal regression. Determine the regression equation relating cholesterol reduction to amount of exercise, and find a 95% confidence interval for the independent variable (exercise). Provide a brief and meaningful written interpretation of the coefficients and the confidence interval. Can we conclude that exercise affects the change in cholesterol level of the exerciser? How well does the linear model fit this data? Justify your response using the regression output. Question 2 Hardwood trees are harvested in a selective manner for the manufacture of fine furniture. Environmental groups are concerned that as few trees are selected for cutting as possible while companies feel that they need a certain amount of wood for manufacturing. To help each group predict the volume of lumber in a selected tree, various measurements are made before the tree is cut. Unfortunately, volume is not easily determined before harvesting. Two common measurements made before cutting down the tree are DBH (the diameter of the tree at breast height, 4.5 feet off the ground) and the height of the tree measured with sighting instruments. After the tree is harvested the volume of lumber may be measured. Both groups believe that a regression model relating volume to diameter and/or height will be helpful. The data file below gives the diameters, heights, and volumes of 31 trees harvested in the Allegheny National Forest in Pennsylvania. The data are contained in the file Wood.xlsx. Estimate the two simple regression models and the multiple regression model that are appropriate for these data based upon the description above. This will require you to conduct three separate regressions. DO NOT try to use any other model for the volume estimate. There are other models available but for the purpose of this assignment you are limited to the models described above. Which of the three models would you recommend that the two groups use? Why? HINT: you should analyze the correlation matrix to answer this question. A specific tree with a height of 62 feet and a diameter of 17.9 inches has just arrived at the mill. DO you believe your model is appropriate for making a point estimate for this tree? What volume can be expected? Build a 95% confidence interval for the volume of this tree. Question 3 Lotteries are important sources of revenue for governments and charities. Many people have criticized lotteries, however, as taxes on the poor and uneducated. To explore the issue, a sample of 100 adults were asked how much they spend on lottery tickets and a number of socio-economic variables were also recorded. The study was meant to test the following beliefs: I. Relatively uneducated people spend more on lotteries that do educated people. II. Older people spend more on lotteries than do younger people. III. People with more children spend more that people with fewer children. IV. Relatively poor people spend a greater proportion of their income on lotteries that the better off. The file Lottery.xls contains data for the 100 respondents on the amount spend on lottery tickets as a percentage of household income, number of years of education, age, number of children and personal income (in thousands of dollars). Ignoring any co-linearity issues, develop a single multiple regression model relating lottery expenditures to all of the independent variables described above and test the four beliefs listed above at the 95% confidence level. What would be the appropriate conclusions from this analysis? Now check the correlation matrix for this problem. Describe any concerns this matrix raises regarding the usefulness of the independent variables and the relationships between them. Without actually creating a new model, describe which independent variables you would use to improve the quality of model compared to the full model used in part a)