Learner class for supervised, unsupervised and semi supervised learning with Protein data.
Learner
(X_train
:ndarray
, y_train
:ndarray
, X_test
:ndarray
, y_test
:ndarray
, ohe
:bool
=False
, scaler
:bool
=False
, pca
:bool
=True
, pca_n_components
:int
=50
, param_grids
:list
=None
)
Class for training and prediction.
|
Type |
Default |
Details |
X_train |
ndarray |
|
X_train numpy ndarray |
y_train |
ndarray |
|
y_train numpy ndarray |
X_test |
ndarray |
|
X_test numpy ndarray |
y_test |
ndarray |
|
y_test numpy ndarray |
ohe |
bool |
False |
to use one hot encoding or not |
scaler |
bool |
False |
to use standard scaling or not |
pca |
bool |
True |
to use principal component analysis or not |
pca_n_components |
int |
50 |
PCA number of components |
param_grids |
list |
None |
param_grid for grid search, if None - gets default grid from utils |
Learner.create_pipeline
()
Create and return pipeline
Learner.train
(scoring
:str
='accuracy'
, cv
:int
=5
, n_jobs
:int
=-1
)
Run GridSearchCV for all models on X_train and y_train of dataset.
Returns:
train_results: list of grid search results
grid_list: list of trained grid objects
Learner.get_top_5_train_results
()
Return top 5 results for each grid
Learner.predict
()
Get predictions on the dataset's X_test from best estimators of GridSearchCV.
Learner.pick_k
(max_clusters
:int
=10
, pca_n_components
:int
=50
)
Plot elbow and silohutte curves & print silohutte scores to help determine the ideal 'k' for Kmeans.
|
Type |
Default |
Details |
max_clusters |
int |
10 |
max number of clusters to try out |
pca_n_components |
int |
50 |
number of components to reduce to in PCA |
The pick_k
method does the following to help determine the ideal k for KMeans:
- It first concats X_train and X_test of this dataset into a single ndarray 'X'
- then encodes X using OneHotEncoder
- then sclaes X using StandardScaler
- then dimensionality reduces X using PCA
- then plots elbow & silhouette plots for X and prints silhouette scores, and returns the PCA-reduced X.
Learner.analyze_clusters
(X_pca
:ndarray
, k
:int
, random_state
:int
=10
)
Perform KMeans clustering, print cluster counts and plot clusters from the result.
|
Type |
Default |
Details |
X_pca |
ndarray |
|
dim reduced X numpy ndarray |
k |
int |
|
the chosen value of k for KMeans |
random_state |
int |
10 |
random state for KMeans |
Learner.run_label_spreading
(pca_n_components
:int
=50
)
Run Label Spreading, print report, append results to predict_results.
|
Type |
Default |
Details |
pca_n_components |
int |
50 |
number of components to reduce to in PCA |