I use the pdp package to execute partial dependence for linear regression, and it works flawlessly without any warnings. However, when I switch to the classification(logistic) label for xgboost. I received partial dependence warning messages stating that the partial reliance is based on linear as follows. May I inquire whether the code has to be updated in any way to precisely feed the categorization object using the xgboost package so that the partial dependence is correct as said here? Or I can disregard the warning notice because it is already right. I know randomforest is simple with no warning messages.
Code:
# Load required packages
library(pdp)
library(xgboost)
# Simulate training data with ten million records
set.seed(101)
trn <- as.data.frame(mlbench::mlbench.friedman1(n = 1e+07, sd = 1))
trn=trn[sample(nrow(trn), 500), ]
trn$y=ifelse(trn$y>16,1,0)
# Fit an XGBoost classification(logistic) model
set.seed(102)
bst <- xgboost(data = data.matrix(subset(trn, select = -y)),
label = trn$y,
objective = "reg:logistic",
nrounds = 100,
max_depth = 2,
eta = 0.1)
#partial dependency plot
pd <- partial(bst$handle,
pred.var = c("x.1"),
grid.resolution = 10,
train = data.matrix(subset(trn, select = -y)),
prob=TRUE,
plot = FALSE,
.progress = "text")
Warning message:
In superType.default(object) :
`type` could not be determined; assuming `type = "regression"`