This post includes some different problems I encountered during the training process of multi-class classification problems using PyTorch. It is used to remind me of some concepts and issues handling methods might happen again in the future.
Code
Create the data with preprocessing
During the preprocessing, we need to notice that the y_blob is assigned to be LongTensor because in PyTorch, when using the nn.CrossEntropyLoss for computing the loss, the target tensor (label) must be of type torch.long. This is because the loss function expects the target tensor to contain class indices as long integer to deal with large range of classification labels. torch.nn.CrossEntropyLoss require label tensor to be LongTensor.
# transform from numpy arrays to tensors X_blob = torch.from_numpy(X_blob).type(torch.float) y_blob = torch.from_numpy(y_blob).type(torch.LongTensor) # must be long type because loss functions do not accept float indices
# split the data X_blob_train, X_blob_test, y_blob_train, y_blob_test = train_test_split(X_blob, y_blob, test_size=0.2, random_state=RANDOM_SEED)
# plot the data plt.figure(figsize=(10, 7)) plt.scatter(X_blob[:, 0], X_blob[:, 1], c=y_blob, cmap=plt.cm.RdYlBu)
Build the model
We can define the constructor to have multiple parameters explicitly, but only the input_features is needed during the training because forward function takes only one parameter.
# CrossEntropyLoss is probably the only choice for multi-classification problem loss_fn = nn.CrossEntropyLoss()
# the most common optimizers are SGD and Adam optimizer = torch.optim.SGD(params=model_4.parameters(), lr=0.01)
Train the model
Note here, the nn.CrossEntropyLoss() only accepts the logits input (which means it does not want the value after softmax). However, we still have a y_pred after softmax because we need it to calcualte the accuracy.
ALso note very important thing here, dim=1 means we want to calculate the metrics by rows, based on columns, which means our softmax and argmax function are all getting the results from each row, and doing calculation based on the columns. dim=1 literally stands for “given the row not changed, get the result from different columns in that row”.
# test model_4.eval() with torch.inference_mode(): test_logits = model_4(X_blob_test) test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1) # note here
if epoch % 100 == 0: print(f"Epoch: {epoch} | Loss: {loss:.4f}, Acc: {acc:.2f}% | Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%")
Evaluate the model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
model_4.eval() with torch.inference_mode(): y_logits = model_4(X_blob_test)
# remember to manually activate the logits by applying softmax and argmax y_pred_probs = torch.softmax(y_logits, dim=1) y_preds = torch.argmax(y_pred_probs, dim=1)