Utility functions
 

get_default_param_grid[source]

get_default_param_grid()

Create and return a default gird search param grid
def get_default_param_grid() -> list:
    """Create and return a default gird search param grid"""

    lr_grid = [
        {
            "classifier": [LogisticRegression()],
            "classifier__solver": ["lbfgs"],
            "classifier__penalty": ["l2"],
            "classifier__C": np.logspace(-2, 2, 5),  # default=1.0
            "classifier__max_iter": [1000, 5000, 10000],  # default=100
        },
        {
            "classifier": [LogisticRegression()],
            "classifier__solver": ["liblinear"],
            "classifier__penalty": ["l1", "l2"],
            "classifier__C": np.logspace(-2, 2, 5),
            "classifier__max_iter": [1000, 5000, 10000],
        },
    ]
    svm_grid = [
        {
            "classifier": [LinearSVC()],
            "classifier__loss": ["hinge"],  # default=squared_hinge
            "classifier__penalty": ["l2"],  # default=l2
            "classifier__C": np.logspace(-2, 2, 5),  # default=1.0
            "classifier__max_iter": [10000, 20000],  # default=1000
        },
        {
            "classifier": [LinearSVC()],
            "classifier__loss": ["squared_hinge"],
            "classifier__penalty": ["l1", "l2"],
            "classifier__C": np.logspace(-2, 2, 5),
            "classifier__max_iter": [10000, 20000],
        },
    ]
    xgb_grid = [
        {
            "classifier": [XGBClassifier()],
            # "classifier__gamma": (5, 10),
            "classifier__learning_rate": np.linspace(0.03, 0.3, 4),  # default 0.1
            "classifier__max_depth": [3, 4, 5, 6],  # default 6
            "classifier__n_estimators": [100, 300],  # default 100
            # "classifier__subsample": (0.5, 1),
        }
    ]

    return [lr_grid, svm_grid, xgb_grid]

visualize_2pcs[source]

visualize_2pcs(pcs:ndarray, y:ndarray)

Visualize 2 principal components.
Type Default Details
pcs ndarray dimensionality reduced 'X' numpy ndarray
y ndarray 'y' numpy array

visualize_3pcs[source]

visualize_3pcs(pcs:ndarray, y:ndarray)

Visualize 3 principal components.
Type Default Details
pcs ndarray dimensionality reduced 'X' numpy ndarray
y ndarray 'y' numpy array

train_predict[source]

train_predict(X_train:ndarray, y_train:ndarray, X_test:ndarray, y_test:ndarray)

Utility helper function to quickly train and predict
Type Default
X_train ndarray
y_train ndarray
X_test ndarray
y_test ndarray

visualize_elbow[source]

visualize_elbow(X:ndarray, ks:list, random_state:int=10)

Visualize elbow plot for KMeans.
Type Default Details
X ndarray 'X' numpy ndarray
ks list list of 'k' values to try - ideally 2 to 10
random_state int 10 random state for KMeans

plot_silhouette_scores[source]

plot_silhouette_scores(max_clusters:int, X:ndarray, random_state:int=10)

List and plot silhouette scores for KMeans.
Type Default Details
max_clusters int max value for 'k' for KMeans clustering
X ndarray the 'X' numpy ndarray
random_state int 10 random state for KMeans

visualize_clusters[source]

visualize_clusters(clust_lbls:ndarray, X:ndarray)

Visualize clusters in a plot of first 2 principal components.
Type Default Details
clust_lbls ndarray cluster labels after KMeans fitting
X ndarray dim reduced X used for KMeans