Refer to code
.
import torch
import torch.nn as nn
# Assume a batch size of 2 and a sequence length of 3, and the model's vocabulary size is 5.
# So, your predicted logits would have a shape of (batch size, sequence length, vocab size)
logits = torch.tensor([
[[0.1, 0.2, 0.3, 0.4, 0.5], [0.5, 0.4, 0.3, 0.2, 0.1], [0.1, 0.2, 0.3, 0.4, 0.5]],
[[0.5, 0.4, 0.3, 0.2, 0.1], [0.1, 0.2, 0.3, 0.4, 0.5], [0.5, 0.4, 0.3, 0.2, 0.1]]
])
logits = logits.view(-1, logits.shape[-1]) # Reshape logits to be 2D (N, C), where N is batch_size*seq_length, C is vocab_size
# Similarly, your labels would have a shape of (batch size, sequence length).
# These are example labels.
labels = torch.tensor([
[0, 1, 2],
[2, 1, 0]
])
labels = labels.view(-1) # Reshape labels to be 1D (N)
loss_function = nn.CrossEntropyLoss() # Initialize loss function
loss = loss_function(logits, labels) # Compute the loss
print(loss) # Print the loss
..
In this example, logits and labels are explicitly defined tensors. The values in logits represent the output from your model for each token in the sequence for each example in your batch, and the labels tensor represents the correct labels or classes for each of these tokens. nn.CrossEntropyLoss() is then used to compute the loss between the predicted logits and the actual labels.
Thank you.
๐๐ป♂️
No comments:
Post a Comment