Ensemble Classifiers¶

Introduction

This project aims to demonstrate the effectiveness of ensembles classifiers by comparing and contrasting several classifier families. The problem space, which consists of a simple dataset, insurance data, and a simple target, binary classification, is pushed to its limit to give ultra high test set accuracy.¶

Requirements¶

import pandas as pd
import numpy as np
import seaborn as sns
import math
import category_encoders as ce
from collections import defaultdict
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from xgboost import XGBClassifier

Load and preprocess data¶

Read data to csv and coerce data type.

pstr = pd.StringDtype()
dtype_dict = {'State': pstr, 'Customer Lifetime Value ': np.float64, 'Response': pstr, 'Coverage': pstr, 'Education': pstr,
              'EmploymentStatus': pstr, 'Gender': pstr, 'Income': int, 'Location Code': pstr, 'Marital Status': pstr,
              'Monthly Premium Auto': int, 'Months Since Last Claim': int, 'Number of Open Complaints': int, 
              'Number of Policies': int, 'Policy': pstr, 'Renew Offer Type': pstr, 'Sales Channel': pstr,
              'Total Claim Amount': np.float64, 'Vehicle Class': pstr}

df_raw = pd.read_csv("Customer relationship marketing (CRM).csv", parse_dates=['Effective To Date'], dtype=dtype_dict)

df_raw.head()

The data set consists of 19 features and one binary label variable, response. The objective is to predict a customers response to an insurance policy offer via the accompanying feature variables.

df_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9134 entries, 0 to 9133
Data columns (total 20 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   State                      9134 non-null   string        
 1   Customer Lifetime Value    9134 non-null   float64       
 2   Response                   9134 non-null   string        
 3   Coverage                   9134 non-null   string        
 4   Education                  9134 non-null   string        
 5   Effective To Date          9134 non-null   datetime64[ns]
 6   EmploymentStatus           9134 non-null   string        
 7   Gender                     9134 non-null   string        
 8   Income                     9134 non-null   int64         
 9   Location Code              9134 non-null   string        
 10  Marital Status             9134 non-null   string        
 11  Monthly Premium Auto       9134 non-null   int64         
 12  Months Since Last Claim    9134 non-null   int64         
 13  Number of Open Complaints  9134 non-null   int64         
 14  Number of Policies         9134 non-null   int64         
 15  Policy                     9134 non-null   string        
 16  Renew Offer Type           9134 non-null   string        
 17  Sales Channel              9134 non-null   string        
 18  Total Claim Amount         9134 non-null   float64       
 19  Vehicle Class              9134 non-null   string        
dtypes: datetime64[ns](1), float64(2), int64(5), string(12)
memory usage: 1.4 MB

The data does not contain any missingness thus no data imputation is necessary. Feature engineering and data encoding follows. For now, we will keep all feature variables, and continue with determining an encoding scheme. When doing predicitve modeling with mixed categorical/metric features, especialy when the feature count and/or the feature cardinalities are high, the choice of data encoding scheme can be very important.

There are a moderately high number of features present, with most features possessing low cardinality. Thus, the curse of dimensionality (CoD) is a moderate concern our encoding scheme must consider. Additionally, we will be preforming classification as the label variable (response) is categorical binary, and are likely to utilize decision trees which have difficulty dealing with large feature counts (sensitive to CoD). The lowest feature class counts are given below.

def column_value_counts(df_in):
    value_table = pd.crosstab(**df_in.melt(var_name='columns', value_name='index'))
    value_table = value_table.melt(var_name='column', value_name='count', ignore_index=False).reset_index()
    value_table = value_table[value_table['count'] != 0].rename(columns={'index': 'value'})
    cols = value_table.columns.to_list()
    value_table = value_table[[cols[1], cols[0], cols[2]]].sort_values('count', ascending=False).reset_index(drop=True)
    
    return value_table

column_value_counts(df_raw[df_raw.columns[df_raw.dtypes == pstr]]).tail(10)

All class counts for categorical features are high enough, so there is no need to lump together fringe class values into an ‘other’ category. Gender is binary, and Coverage and Education contain strong ordinal elements, so they will all be label encoded. Policy has the highest cardinality of nine, so we want to avoid one-hot encoding if possible. Policy appears to be a combination of two features, each of cardinality three. Splitting Policy gives a nominal feature, and an ordinal categorical feature, which can be label encoded, thus we save five dimensions over one-hot encoding Policy. Renew Offer Type cannot be assumed to be ordinal despite its appearence. This leaves all unencoded categorical variables which will need to be one hot encoded.

# feature engineering data frame
df_ml = df_raw.copy(deep=True)
df_ml['Policy Level'] = df_ml['Policy'].map(lambda val: val[-1]).astype(int)
df_ml['Policy'] = df_ml['Policy'].map(lambda val: val[:-3]).astype(pstr)

# encoded data
categorical_cols = df_ml.columns[df_ml.dtypes == pstr].to_list()
df_enc = df_ml[categorical_cols].drop(columns=['Response']).copy(deep=True)
y_label = df_ml['Response'].map({'No': 0, 'Yes': 1})

# ordinal encoding
def get_ce_map(col_name, zip_obj):
    return {'col': col_name, 'mapping': {x: y for x, y in zip_obj}}

classes = [['Basic', 'Extended', 'Premium'],
           ['High School or Below', 'College', 'Bachelor', 'Master', 'Doctor']]
rules = [zip(l1, list(range(1, len(l1)+1))) for l1 in classes]
col_list = ['Coverage', 'Education']
col_rule = zip(col_list, rules)
ce_1 = ce.OrdinalEncoder(mapping=[get_ce_map(col, rule) for col, rule in col_rule], cols=col_list)
df_enc = ce_1.fit_transform(df_enc)
df_enc = df_enc.astype({x: 'category' for x in df_enc.columns.to_series()[df_enc.dtypes == pstr].values})

# one-hot (k degrees of freedom)
ce_2 = ce.OneHotEncoder()
df_enc_hot = ce_2.fit_transform(df_enc)

# dummy encoding (k-1 degrees of freedom)
df_enc_dummy = pd.get_dummies(df_enc, drop_first=True)

/home/noah/anaconda3/envs/main/lib/python3.7/site-packages/category_encoders/utils.py:21: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  elif pd.api.types.is_categorical(cols):

We have a date variable which varies from January 1st 2011 to February 28th 2011. Date variables often exhibit linear and seasonal trends. To give the date variable the greatest opportunity to demonstrate potential relationships with the target variable it will be decomposed into several linear and cyclical components, the latter of which can be modeled with two sinusoidal components. If the new 'date derived' components fail to account for any significant percentage of the variance of the target variable, they will be removed in feature selection.

def get_date_features(date_vec):
    
    # default data dict
    data_dict = defaultdict()
    
    # sudo constant terms
    data_dict['Days_in_Year'] = [366 if x.is_leap_year else 356 for x in date_vec]
    data_dict['Days_in_Month'] = [getattr(x, 'days_in_month') for x in date_vec]
    
    # linear terms
    data_dict['Month'] = [getattr(x, 'month') for x in date_vec]
    data_dict['Day_of_Month'] = [getattr(x, 'day') for x in date_vec]
    data_dict['Day_of_Week'] = [getattr(x, 'day_of_week') for x in date_vec]
    data_dict['Day_of_Year'] = [getattr(x, 'day_of_year') for x in date_vec]
    
    # cyclical terms
    data_dict['Day_of_Month_Cyclical_1'] = [math.cos(2*math.pi*x/y) for x, y in zip(data_dict['Day_of_Month'], data_dict['Days_in_Month'])]
    data_dict['Day_of_Month_Cyclical_2'] = [math.sin(2*math.pi*x/y) for x, y in zip(data_dict['Day_of_Month'], data_dict['Days_in_Month'])]
    data_dict['Day_of_Week_Cyclical_1'] = [math.cos(2*math.pi*x/7) for x in data_dict['Day_of_Year']]
    data_dict['Day_of_Week_Cyclical_2'] = [math.cos(2*math.pi*x/7) for x in data_dict['Day_of_Year']]
    data_dict['Day_of_Year_Cyclical_1'] = [math.cos(2*math.pi*x/y) for x, y in zip(data_dict['Day_of_Year'], data_dict['Days_in_Year'])]
    data_dict['Day_of_Year_Cyclical_2'] = [math.sin(2*math.pi*x/y) for x, y in zip(data_dict['Day_of_Year'], data_dict['Days_in_Year'])]
    
    return pd.DataFrame(data=data_dict)

The date features dataframe and the mertic data are merged with the feature engineering dataframe, and the encoding dataframes as the date features need no encoding. The ml dataframe keeps track of our features while the encoded dataframes are passed to the models.

df_date_features = get_date_features(df_ml['Effective To Date']).drop(columns=['Days_in_Year', 'Days_in_Month'])
df_ml = df_ml.drop(['Effective To Date'], axis=1)
df_ml = df_ml.join(df_date_features)
df_enc_hot = df_enc_hot.join(df_ml.drop(categorical_cols + ['Response'], axis=1))
df_enc_dummy = df_enc_dummy.join(df_ml.drop(categorical_cols + ['Response'], axis=1))

We have to consider the assumptions of the ML models we will use and some ml models require metric data be standardized. These models also require dummy encoding (each feature to k-1 features) instead of one-hot encoding (each feature to k features). This is to remove redundant information. but many models such as decision tree models can benefit from keeping all the columns.

# standardized data
df_std = pd.DataFrame(data=StandardScaler().fit_transform(X=df_enc_dummy), columns=df_enc_dummy.columns.to_list())

Feature importance¶

Random Forest¶

rf_c = RandomForestClassifier(random_state=42)
rf_c_param = { 'n_estimators': [200], 
              'max_depth': [10, 15],
              'min_samples_leaf': [1, 2]}

gs_cv_rf_c = GridSearchCV(rf_c, rf_c_param, scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'], refit='f1_macro')

gs_cv_rf_c.fit(df_enc_hot, y_label);

df_importances = pd.DataFrame(data={'importance': gs_cv_rf_c.best_estimator_.feature_importances_, 'feature': df_enc_hot.columns.to_list()})

df_importances = pd.DataFrame(data={'importance': gs_cv_rf_c.best_estimator_.feature_importances_, 'feature': df_enc_hot.columns.to_list()})
df_importances['features_combine'] = np.array([col[:-2] if col[-2] == '_' else col for col in df_enc_hot.columns])
importance_collapsed = []
features_collapsed = []
for feature in df_importances['features_combine'].unique():
    importance = df_importances['importance'][df_importances['features_combine'] == feature].sum()
    importance_collapsed.append(importance)
    features_collapsed.append(feature)
    
df_importances_collapsed = pd.DataFrame(data={'importance': importance_collapsed, 'feature': features_collapsed}).sort_values('importance', ascending=False)

sns.set(rc={'figure.figsize':(20,8.5)})
ax = sns.barplot(data=df_importances_collapsed, x='feature', y='importance')
ax.set_xticklabels(ax.get_xticklabels(), rotation=80);

gs_cv_rf_c.best_estimator_.score(df_enc_hot, y_label)

1.0

The classifier fit very well, possibly overfit, to the data as it was able to correctly classify all of the data. It appears some of our cyclical date components performed well, and the most important feature was one's employment status. State, policy, month, policy level, and number of complaints were not very useful for the classifier, and may be removed.

remove_features = df_importances_collapsed['feature'][-7:].values

XGBoost¶

xgb_c = XGBClassifier(use_label_encoder=False)
xgb_c_param = {'n_estimators': [200],
               'eval_metric': ['logloss'],
               'max_depth': [10, 14],
               'subsample': [0.8, 0.9]}

gs_cv_xgb_c = GridSearchCV(xgb_c, xgb_c_param, scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'], refit='f1_macro')

gs_cv_xgb_c.fit(df_enc_hot, y_label);

df_importances = pd.DataFrame(data={'importance': gs_cv_xgb_c.best_estimator_.feature_importances_, 'feature': df_enc_hot.columns.to_list()})
df_importances['features_combine'] = np.array([col[:-2] if col[-2] == '_' else col for col in df_enc_hot.columns])
importance_collapsed = []
features_collapsed = []
for feature in df_importances['features_combine'].unique():
    importance = df_importances['importance'][df_importances['features_combine'] == feature].sum()
    importance_collapsed.append(importance)
    features_collapsed.append(feature)
    
df_importances_collapsed = pd.DataFrame(data={'importance': importance_collapsed, 'feature': features_collapsed}).sort_values('importance', ascending=False)

sns.set(rc={'figure.figsize':(20,8.5)})
ax = sns.barplot(data=df_importances_collapsed, x='feature', y='importance')
ax.set_xticklabels(ax.get_xticklabels(), rotation=80);

gs_cv_xgb_c.best_estimator_.score(df_enc_hot, y_label)

1.0

XGBoost has different suggestions for which features are the most important. This is good and suggests it is finding different relationships than Random Forest which will be useful later when classifiers are stacked together. XGBoost found employment status to be the most important, and policy, day of the week, gender, policy level, and the day of the year cyclical component to be the least important. XGBoost uses gradient boosting instead of bagging which Random Forest uses, which prioritizes correcting erroneous predictions. We will prioritize optimizing a Random Forest classifier and as such drop the features least useful for to Random Forest.

Drop useless features¶

std_drop = ['State_California', 'State_Nevada', 'State_Oregon', 'State_Washington',
            'Number of Open Complaints', 'Coverage', 'Policy Level', 'Policy_Personal', 'Policy_Special', 'Month']
df_std.drop(std_drop, axis=1, inplace=True)

hot_drop = ['State_1', 'State_2', 'State_3', 'State_4', 'State_5',
            'Number of Open Complaints', 'Coverage', 'Policy Level', 'Policy_1',
            'Policy_2', 'Policy_3', 'Month']
df_enc_hot.drop(hot_drop, axis=1, inplace=True)

Part 5 - non-ensemble models¶

X_train_hot, X_test_hot, y_train_hot, y_test_hot = train_test_split(df_enc_hot, y_label, test_size=0.25, random_state=42)
X_train_std, X_test_std, y_train_std, y_test_std = train_test_split(df_std, y_label, test_size=0.25, random_state=42)

Support Vector Machine¶

svc_c = SVC(random_state=42)
svc_c_param = {'gamma': ['scale', 'auto'], 
              'C': [0.9, 1, 1.1]}

gs_cv_svc_c = GridSearchCV(svc_c, svc_c_param, scoring=['accuracy', 'f1_macro'], refit='f1_macro')

gs_cv_svc_c.fit(X_train_std, y_train_std)

GridSearchCV(estimator=SVC(random_state=42),
             param_grid={'C': [0.9, 1, 1.1], 'gamma': ['scale', 'auto']},
             refit='f1_macro', scoring=['accuracy', 'f1_macro'])

print(f'Training Data Accuracy: {gs_cv_svc_c.best_estimator_.score(X_train_std, y_train_std)}')
print(f'Testing Data Accuracy: {gs_cv_svc_c.best_estimator_.score(X_test_std, y_test_std)}')

Training Data Accuracy: 0.9214598540145985
Testing Data Accuracy: 0.8966725043782837

The SVM classifier did adequately, with not too much overfitting, and 90 accuracy on the test data set. A SVM classifier is not the optimal classifier for our application as it is only capable of finding linear associations, however it will make a good baseline for future comparison.

Naive Bayes¶

nb_c = GaussianNB()
nb_c_param = {'var_smoothing': [1e-8, 1e-09, 1e-8]}

gs_cv_nb_c = GridSearchCV(nb_c, nb_c_param, scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'], refit='f1_macro')

gs_cv_nb_c.fit(X_train_std, y_train_std)

GridSearchCV(estimator=GaussianNB(),
             param_grid={'var_smoothing': [1e-08, 1e-09, 1e-08]},
             refit='f1_macro', scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'])

print(f'Training Data Accuracy: {gs_cv_nb_c.best_estimator_.score(X_train_std, y_train_std)}')
print(f'Testing Data Accuracy: {gs_cv_nb_c.best_estimator_.score(X_test_std, y_test_std)}')

Training Data Accuracy: 0.4137226277372263
Testing Data Accuracy: 0.4172504378283713

The naive bayes classifier preformed very poorly, only able to obtain 40% accuracy in either dataset. The classifier assumes a gaussian distribution for each feature which we know is not true for many of the features, there are several count variables for example.

Multi-Layer Perceptron¶

mlp_c = MLPClassifier()
mlp_c_param = {'learning_rate': ['adaptive'],
               'max_iter': [500],
               'hidden_layer_sizes': [[64], [64, 32], [128, 64]]}

gs_cv_mlp_c = GridSearchCV(mlp_c, mlp_c_param, scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'], refit='f1_macro')

gs_cv_mlp_c.fit(X_train_std, y_train_std)

GridSearchCV(estimator=MLPClassifier(),
             param_grid={'hidden_layer_sizes': [[64], [64, 32], [128, 64]],
                         'learning_rate': ['adaptive'], 'max_iter': [500]},
             refit='f1_macro', scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'])

print(f'Training Data Accuracy: {gs_cv_mlp_c.best_estimator_.score(X_train_std, y_train_std)}')
print(f'Testing Data Accuracy: {gs_cv_mlp_c.best_estimator_.score(X_test_std, y_test_std)}')

Training Data Accuracy: 1.0
Testing Data Accuracy: 0.9855516637478109

The multi-layer perceptron fit extremely well to the training data with a perfit fit, and only suffer a small amount of overfitting to give 99% on the test set. The hyperparameter tunned was the network architecture parameter 'hidden_layer_sizes' which governs how many hidden layers there are and how many units are in each layer. The mlp model is essential a deep learning model and as such has a strong capacity to fit to highly non-linear relationships. The model's success over the SVM classifier is likely due in part to the presence of highly non-linear relationships in our data.

Part 6 - ensemble models¶

XGBoost¶

xgb_c = XGBClassifier(use_label_encoder=False)
xgb_c_param = {'n_estimators': [200],
               'eval_metric': ['logloss'],
               'max_depth': [10, 14, 18],
               'subsample': [0.7, 0.8, 0.9]}

gs_cv_xgb_c = GridSearchCV(xgb_c, xgb_c_param, scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'], refit='f1_macro')

gs_cv_xgb_c.fit(X_train_hot, y_train_hot);

print(f'Training Data Accuracy: {gs_cv_xgb_c.best_estimator_.score(X_train_hot, y_train_hot)}')
print(f'Testing Data Accuracy: {gs_cv_xgb_c.best_estimator_.score(X_test_hot, y_test_hot)}')

Training Data Accuracy: 1.0
Testing Data Accuracy: 0.9964973730297724

XGBoost preform very well, fitting perfectly on the training data and with a 99.65% on the test data. The hyperparamaters trained were the depth of the trees and the fraction of the data to be used for the subsampling.

Random Forest¶

rf_c = RandomForestClassifier(random_state=42)
rf_c_param = { 'n_estimators': [250], 
              'max_depth': [10, 15, 20],
              'min_samples_leaf': [1, 2, 3]}

gs_cv_rf_c = GridSearchCV(rf_c, rf_c_param, scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'], refit='f1_macro')

gs_cv_rf_c.fit(X_train_hot, y_train_hot)

GridSearchCV(estimator=RandomForestClassifier(random_state=42),
             param_grid={'max_depth': [10, 15, 20],
                         'min_samples_leaf': [1, 2, 3], 'n_estimators': [250]},
             refit='f1_macro', scoring=['accuracy', 'roc_auc_ovo', 'f1_macro'])

gs_cv_rf_c.best_params_

{'max_depth': 20, 'min_samples_leaf': 1, 'n_estimators': 250}

print(f'Training Data Accuracy: {gs_cv_rf_c.best_estimator_.score(X_train_hot, y_train_hot)}')
print(f'Testing Data Accuracy: {gs_cv_rf_c.best_estimator_.score(X_test_hot, y_test_hot)}')

Training Data Accuracy: 1.0
Testing Data Accuracy: 0.9978108581436077

The random forest classifer is wildly successful. With a perfect fit on the training data and 99.78 % accuracy on the test data, the model barely overfit. The work put in during the encoding and feature engineering stages, which was partially targeted at optimising decision tree classifiers, has proven very useful. The hyerparameters optimised were the max depth of the trees and the minimum number of samples required to be at a leaf node.

Stacked Classifier¶

estimators_r = [('mlp', gs_cv_mlp_c.best_estimator_), ('xgb', gs_cv_xgb_c.best_estimator_), ('rf', gs_cv_rf_c.best_estimator_)]
stack_r = StackingClassifier(estimators=estimators_r,
                             final_estimator=RandomForestClassifier(random_state=42 , **gs_cv_rf_c.best_params_),
                             passthrough=True)
stack_r.fit(X_train_hot, y_train_hot);

print(f'Training Data Accuracy: {stack_r.score(X_train_hot, y_train_hot)}')
print(f'Testing Data Accuracy: {stack_r.score(X_test_hot, y_test_hot)}')

Training Data Accuracy: 1.0
Testing Data Accuracy: 1.0

Employing a stacked classifier improved our results significantly to where we are now achieving 100% accuracy on the test data. The stacked classifier is an ensemble of our three best models (mlp, xgb, rf) which is then passed to another Random Forest classifier. The final random forest classifier has the same hyperparameters as our previously trained Random Forest classifier, and the classifier is also trained on the training data as well. The classifier has a high model complexity and appears to find all valid relationships within our data, thus reaching the limits of perfection in this application.

Conclusion¶

The use of ensemble models was critical to the success of our predictive modeling. The best approach was a stacked classifier composed of the best pre-trained models, which achieved a perfect accuracy of 100% on the test data. It is possible to employ an ensemble model without a decision tree and the best example of that is a stacked classifier in which the result of base models are feed to a final model for classification. Both the base models and the final model need not be deecion tree based models. In our implementation two of the three base models were decison tree based models, and the final model was also a descion tree based model. An example of a fully non-decsion tree based ensemble model is given below, where the base models are the three non-ensemble models from earlier, and the final model is a default logistic classifier.

estimators_r = [('mlp', gs_cv_mlp_c.best_estimator_), ('nb', gs_cv_nb_c.best_estimator_), ('svc', gs_cv_svc_c.best_estimator_)]
stack_r = StackingClassifier(estimators=estimators_r,
                            passthrough=False)
stack_r.fit(X_train_std, y_train_std);

print(f'Training Data Accuracy: {stack_r.score(X_train_std, y_train_std)}')
print(f'Testing Data Accuracy: {stack_r.score(X_test_std, y_test_std)}')

Training Data Accuracy: 1.0
Testing Data Accuracy: 0.98861646234676

The non decision tree based ensemble classifier did very very with 98.90% accuracy on the test set, but it is still not as powerful as the descion tree based ensemble classifier which preformed perfectly.

	State	Customer Lifetime Value	Response	Coverage	Education	Effective To Date	EmploymentStatus	Gender	Income	Location Code	Marital Status	Monthly Premium Auto	Months Since Last Claim	Number of Policies	Policy	Renew Offer Type	Sales Channel	Total Claim Amount	Vehicle Class
0	Washington	2763.519279	No	Basic	Bachelor	2011-02-24	Employed	F	56274	Suburban	Married	69	32	1	Corporate L3	Offer1	Agent	384.811147	Two-Door Car
1	Arizona	6979.535903	No	Extended	Bachelor	2011-01-31	Unemployed	F	0	Suburban	Single	94	13	8	Personal L3	Offer3	Agent	1131.464935	Four-Door Car
2	Nevada	12887.431650	No	Premium	Bachelor	2011-02-19	Employed	F	48767	Suburban	Married	108	18	2	Personal L3	Offer1	Agent	566.472247	Two-Door Car
3	California	7645.861827	No	Basic	Bachelor	2011-01-20	Unemployed	M	0	Suburban	Married	106	18	7	Corporate L2	Offer1	Call Center	529.881344	SUV
4	Washington	2813.692575	No	Basic	Bachelor	2011-02-03	Employed	M	43836	Rural	Single	73	12	1	Personal L1	Offer1	Agent	138.130879	Four-Door Car

	column	value	count
41	EmploymentStatus	Medical Leave	432
42	EmploymentStatus	Disabled	405
43	Policy	Corporate L1	359
44	Education	Doctor	342
45	EmploymentStatus	Retired	282
46	Vehicle Class	Luxury SUV	184
47	Policy	Special L2	164
48	Vehicle Class	Luxury Car	163
49	Policy	Special L3	148
50	Policy	Special L1	66