El siguiente juego de datos es sobre el endurecimiento de cemento Portland, famoso por su nada fácil modelación.
Variable | Descripción |
---|---|
\(x_1\) | Cantidad de tricalcio de aluminato, \(3CaO \cdot Al_2O_3\) |
\(x_2\) | Cantidad de tricalcio de silicato, \(3CaO \cdot SiO_2\) |
\(x_3\) | Cantidad de tricalcio de aluminio ferrito, \(4CaO \cdot Al_2O_3 \cdot Fe_2O_2\) |
\(x_4\) | Cantidad de dicalcio de silicato, \(2CaO \cdot SiO_2\) |
\(y\) | Calor en calorías por gramo de cemento |
Lectura de datos y características de los mismos:
hald.data <- read.csv("~/Desktop/EstApl2/Markdown_MGA/RM_tema_6/Hald.dat", sep="")
dim(hald.data)
## [1] 13 5
str(hald.data)
## 'data.frame': 13 obs. of 5 variables:
## $ x1: int 7 1 11 11 7 11 3 1 2 21 ...
## $ x2: int 26 29 56 31 52 55 71 31 54 47 ...
## $ x3: int 6 15 8 8 6 9 17 22 18 4 ...
## $ x4: int 60 52 20 47 33 22 6 44 22 26 ...
## $ y : num 78.5 74.3 104.3 87.6 95.9 ...
head(hald.data)
## x1 x2 x3 x4 y
## 1 7 26 6 60 78.5
## 2 1 29 15 52 74.3
## 3 11 56 8 20 104.3
## 4 11 31 8 47 87.6
## 5 7 52 6 33 95.9
## 6 11 55 9 22 109.2
summary(hald.data)
## x1 x2 x3 x4
## Min. : 1.000 Min. :26.00 Min. : 4.00 Min. : 6
## 1st Qu.: 2.000 1st Qu.:31.00 1st Qu.: 8.00 1st Qu.:20
## Median : 7.000 Median :52.00 Median : 9.00 Median :26
## Mean : 7.462 Mean :48.15 Mean :11.77 Mean :30
## 3rd Qu.:11.000 3rd Qu.:56.00 3rd Qu.:17.00 3rd Qu.:44
## Max. :21.000 Max. :71.00 Max. :23.00 Max. :60
## y
## Min. : 72.50
## 1st Qu.: 83.80
## Median : 95.90
## Mean : 95.42
## 3rd Qu.:109.20
## Max. :115.90
## $Stat
## p q s S2 SC_res R2 R2adj Cp
## 1 0 1 15.043723 226.313590 2715.76308 NA NA 442.916687
## 2 1 2 10.726716 115.062432 1265.68675 0.5339480 0.4915797 202.548769
## 3 1 2 9.077126 82.394213 906.33634 0.6662683 0.6359290 142.486407
## 4 1 2 13.278145 176.309134 1939.40047 0.2858727 0.2209521 315.154284
## 5 1 2 8.963902 80.351538 883.86692 0.6745420 0.6449549 138.730833
## 6 2 3 2.406335 5.790448 57.90448 0.9786784 0.9744140 2.678242
## 7 2 3 11.077328 122.707206 1227.07206 0.5481667 0.4578001 198.094653
## 8 2 3 2.734266 7.476211 74.76211 0.9724710 0.9669653 5.495851
## 9 2 3 6.445485 41.544273 415.44273 0.8470254 0.8164305 62.437716
## 10 2 3 9.321374 86.888013 868.88013 0.6800604 0.6160725 138.225920
## 11 2 3 4.192112 17.573800 175.73800 0.9352896 0.9223476 22.373112
## 12 3 4 2.312061 5.345624 48.11061 0.9822847 0.9763796 3.041280
## 13 3 4 2.308745 5.330303 47.97273 0.9823355 0.9764473 3.018233
## 14 3 4 2.376648 5.648458 50.83612 0.9812811 0.9750415 3.496824
## 15 3 4 2.863846 8.201617 73.81455 0.9728200 0.9637599 7.337474
## 16 4 5 2.446008 5.982955 47.86364 0.9823756 0.9735634 5.000000
## AIC
## 1 NA
## 2 102.41187
## 3 98.07040
## 4 107.95980
## 5 97.74404
## 6 64.31239
## 7 104.00908
## 8 67.63411
## 9 89.92954
## 10 99.52173
## 11 78.74499
## 12 63.90360
## 13 63.86629
## 14 64.61995
## 15 69.46829
## 16 65.83669
##
## $Coef
## p q Intercept x1 x2 x3 x4
## 1 0 1 95.42308 NA NA NA NA
## 2 1 2 81.47934 1.868748 NA NA NA
## 3 1 2 57.42368 NA 0.7891248 NA NA
## 4 1 2 110.20266 NA NA -1.2557813 NA
## 5 1 2 117.56793 NA NA NA -0.7381618
## 6 2 3 52.57735 1.468306 0.6622505 NA NA
## 7 2 3 72.34899 2.312468 NA 0.4944682 NA
## 8 2 3 103.09738 1.439958 NA NA -0.6139536
## 9 2 3 72.07467 NA 0.7313296 -1.0083862 NA
## 10 2 3 94.16007 NA 0.3109047 NA -0.4569419
## 11 2 3 131.28241 NA NA -1.1998512 -0.7246001
## 12 3 4 48.19363 1.695890 0.6569149 0.2500176 NA
## 13 3 4 71.64831 1.451938 0.4161098 NA -0.2365402
## 14 3 4 111.68441 1.051854 NA -0.4100433 -0.6427961
## 15 3 4 203.64196 NA -0.9234160 -1.4479712 -1.5570449
## 16 4 5 62.40537 1.551103 0.5101676 0.1019094 -0.1440610
##
## attr(,"class")
## [1] "modelSelect" "list"
library(MASS)
modAIC <- stepAIC(lm(y~1,data = hald.data), scope = y~x1+x2+x3+x4,
direction = c("both","forward","backward")[2],trace = 3)
## Start: AIC=71.44
## y ~ 1
##
## Df Sum of Sq RSS AIC
## + x4 1 1831.90 883.87 58.852
## + x2 1 1809.43 906.34 59.178
## + x1 1 1450.08 1265.69 63.519
## + x3 1 776.36 1939.40 69.067
## <none> 2715.76 71.444
##
## Step: AIC=58.85
## y ~ x4
##
## Df Sum of Sq RSS AIC
## + x1 1 809.10 74.76 28.742
## + x3 1 708.13 175.74 39.853
## <none> 883.87 58.852
## + x2 1 14.99 868.88 60.629
##
## Step: AIC=28.74
## y ~ x4 + x1
##
## Df Sum of Sq RSS AIC
## + x2 1 26.789 47.973 24.974
## + x3 1 23.926 50.836 25.728
## <none> 74.762 28.742
##
## Step: AIC=24.97
## y ~ x4 + x1 + x2
##
## Df Sum of Sq RSS AIC
## <none> 47.973 24.974
## + x3 1 0.10909 47.864 26.944
print(summary(modAIC))
##
## Call:
## lm(formula = y ~ x4 + x1 + x2, data = hald.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0919 -1.8016 0.2562 1.2818 3.8982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.6483 14.1424 5.066 0.000675
## x4 -0.2365 0.1733 -1.365 0.205395
## x1 1.4519 0.1170 12.410 5.78e-07
## x2 0.4161 0.1856 2.242 0.051687
##
## Residual standard error: 2.309 on 9 degrees of freedom
## Multiple R-squared: 0.9823, Adjusted R-squared: 0.9764
## F-statistic: 166.8 on 3 and 9 DF, p-value: 3.323e-08
print(anova(modAIC))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x4 1 1831.90 1831.90 343.6758 1.771e-08
## x1 1 809.10 809.10 151.7934 6.150e-07
## x2 1 26.79 26.79 5.0259 0.05169
## Residuals 9 47.97 5.33