In this version I have: * Fixed some potential bugs in split_bins
, woe_transfer
time_series_proc
for time series data processing.ranking_percent_proc
,ranking_percent_dict
are for processing ranking percent variables and generating ranking percent dictionary.read_dt
to read_data
and add and parameter pattern for matching files.traing_xgb
,‘xgb_params’save_dt
to save_data
and save_data
also supports multiple data frames.In this version I have:
pred_xgb
for using xgboost model to predict new data.get_psi_plots
, psi_plot
to plot PSI of your data..p_to_score
for transforming probability to score.multi_left_jion
for left jion a list of datasets fast.read_data
for loading csv or txt data fast.In this version I have:
xgb_filter
, feature_selector
, split_bins
, ks_table_plot
, ks_psi_plot
, ks_value
.pred_score
for predicting new data using scorecard.lr_params_search
, xgb_params_search
for searching the optimal parameters. “random_search”,“grid_search”,“local_search” are available.partial_dependence_plot
, get_partial_dependence_plots
for generating partial dependence plot.cohort_analysis
, cohort_table
, cohort_plot
for cohort (vintage) analysis and visualization.perf_table
, roc_plot
, ks_plot
, lift_plot
, psi_plot
for model validation drawings.In this version I have: * Fixed some potential bugs in get_names
, digits_num
In this version I have:
data_exploration
for data exploration.missing_proc
, outliers_proc
,get_names
lasso_filter
, AUC
&K-S
is added to select the best lambda. In this way, not only can the set of variables that makes the AUC or K-S maximized be selected, but also the multicollinearity (which is difficult to eliminate by AIC in stepwise regression), can be minimized. That means instead of stepwise regression, the optimal combination of variables can be selected by lasso to solve the regression problem.K-S
or AUC
values corresponding to different lambda.auc_value
ks_value
, which can calculate Kolmogorov-Smirnov (K-S) & AUC of multiple model results quickly.