Showing posts with label Deep learning. Show all posts
Showing posts with label Deep learning. Show all posts

9/30/2024

How Gradient calculation in batch size.

 Let's use a simplified example with just 2 data points and walk through the process with actual numbers. This will help illustrate how gradients are calculated and accumulated for a batch.

Let's assume we have a very simple model with one parameter w, currently set to 1.0. Our loss function is the square error, and we're using basic gradient descent with a learning rate of 0.1.

Data points:

  1. x1 = 2, y1 = 4
  2. x2 = 3, y2 = 5

Batch size = 2 (both data points in one batch)

Step 1: Forward pass

  • For x1: prediction = w * x1 = 1.0 * 2 = 2
  • For x2: prediction = w * x2 = 1.0 * 3 = 3

Step 2: Calculate losses

  • Loss1 = (prediction1 - y1)^2 = (2 - 4)^2 = 4
  • Loss2 = (prediction2 - y2)^2 = (3 - 5)^2 = 4
  • Total batch loss = (Loss1 + Loss2) / 2 = (4 + 4) / 2 = 4

Step 3: Backward pass (calculate gradients)

  • Gradient1 = 2 * (prediction1 - y1) * x1 = 2 * (2 - 4) * 2 = -8
  • Gradient2 = 2 * (prediction2 - y2) * x2 = 2 * (3 - 5) * 3 = -12

Step 4: Accumulate gradients

  • Total gradient = (Gradient1 + Gradient2) / 2 = (-8 + -12) / 2 = -10

Step 5: Update weight (once for the batch)

  • New w = old w - learning_rate * total gradient
  • New w = 1.0 - 0.1 * (-10) = 2.0

So, after processing this batch of 2 data points:

  • We calculated 2 individual gradients (-8 and -12)
  • We accumulated these into one total gradient (-10)
  • We performed one weight update, changing w from 1.0 to 2.0

This process would then repeat for the next batch. In this case, we've processed all our data, so this completes one epoch.

7/15/2023

combine costum fc with hugging face model, good to remember and modify for modifications

 refer to code:


.

    def model_forward(self, pixel_values, labels):
# Origin vit encoder-decoder outputs
outputs = self.model(pixel_values=pixel_values, labels=labels, output_hidden_states=True)
# Get last hidden state
last_hidden_state = outputs.decoder_hidden_states[-1] # batch_size, seq_len, hidden_size, ex)5, 15, 768
return last_hidden_state

def fc_part(self, last_hidden_state):
# Reshape the last hidden state
reshaped_logits = last_hidden_state.view(-1, self.model.config.decoder.hidden_size) # batch_size*seq_len, hidden_size
# Apply the fully connected layer
new_logits = self.custom_decoder_fc(reshaped_logits) # batch_size*seq_len, vocab_size
return new_logits

def compute_loss(self, new_logits, labels):
# Reshape labels to match logits dimension
reshaped_labels = labels.view(-1) #batch_size, seq_len -> batch_size*seq_len
# Calculate loss
# [batch_size*seq_len, vocab_size] vs [batch_size*seq_len] #ex) [70, 13] vs [70]
loss = self.loss_f(new_logits, reshaped_labels) #scalar tensor
return loss

def forward_pass(self, pixel_values, labels):
last_hidden_state = self.model_forward(pixel_values, labels) # batch_size, seq_len, hidden_size
new_logits = self.fc_part(last_hidden_state) # batch_size*seq_len, vocab_size
loss = self.compute_loss(new_logits, labels) # scalar tensor
# Reshape new_logits to match labels dimension
new_logits = new_logits.view(labels.shape[0], labels.shape[1], -1) # bathc_size, seq_len, vocab_size

return {'logits':new_logits, 'loss':loss}

..


forward_pass do process step by step.

And in the end return last hidden states logits and loss.


Thank you.

www.marearts.com

🙇🏻‍♂️

7/13/2023

Beam search function for image to text or nlp inference purpose.

  refer to code first.

.

#this beam search only deal with batch size 1
def beam_search(self, pixel_value, max_length):
beam_size = self.cfg.num_beams
alpha = self.cfg.beam_alpha # Length normalization coefficient
temperature = self.cfg.beam_temp # Temperature for softmax

# Initialize input ids as bos_token_id
first_sequence = torch.full((pixel_value.shape[0], 1), self.model.config.decoder_start_token_id).to(pixel_value.device)
# ic(first_sequence) #tensor([[1]])

# Predict second token id
outputs = self.forward_pass(pixel_value, first_sequence)
# ic(outputs.keys()) #dict_keys(['logits', 'loss'])
# We only need the logits corresponding to the last prediction
next_token_logits = outputs['logits'][:, -1, :]
# ic(outputs['logits'].shape) #[1, 1, 13] batch, seq, vocab_size
# ic(outputs['logits'][:, -1, :].shape) #[1, 13] batch, vocab_size

# Apply temperature
# ic(next_token_logits)
# [-5.0641, 32.7805, -2.6743, -4.6459, 0.8130, -1.3443, -1.2016, -4.0770,
# -3.5401, 0.2425, -5.3685, -1.8074, -5.2606]],
# next_token_logits /= temperature
# ic(next_token_logits)
# [-7.2344, 46.8292, -3.8204, -6.6370, 1.1614, -1.9205, -1.7166, -5.8243,
# -5.0573, 0.3464, -7.6693, -2.5820, -7.5152]],

# Select top k tokens
next_token_probs = F.softmax(next_token_logits, dim=-1)
top_k_probs, top_k_ids = torch.topk(next_token_probs, beam_size)
# ic(F.softmax(next_token_logits, dim=-1))
# tensor([[3.3148e-24, 1.0000e+00, 1.0072e-22, 6.0241e-24, 1.4680e-20, 6.7340e-22,
# 8.2570e-22, 1.3579e-23, 2.9239e-23, 6.4976e-21, 2.1458e-24, 3.4751e-22,
# 2.5034e-24]]
# ic(top_k_probs, top_k_ids)
# top_k_probs: tensor([[1.]], grad_fn=<TopkBackward0>)
# top_k_ids: tensor([[1]])

# Prepare next sequences. Each top 1 token is appended to the first_sequence
# ic(first_sequence.shape) #[1, 1]
next_sequences = first_sequence.repeat_interleave(beam_size, dim=0)
# ic(next_sequences.shape) #[10, 1] 10 is beam size, 1 is seq length
next_sequences = torch.cat([next_sequences, top_k_ids.view(-1, 1)], dim=-1)
# ic(next_sequences.shape) #[10, 2] 10 is beam size, 2 is seq length
# ic(next_sequences)

# Also prepare a tensor to hold the cumulative scores of each sequence, or the sum of the log probabilities of each token in the sequence
sequence_scores = (torch.log(top_k_probs).view(-1)) #/ (1 + 1) ** alpha
# ic(sequence_scores) #[ 0.0000, -15.9837]

# We'll need to repeat the pixel_values for each sequence in each beam
pixel_value = pixel_value.repeat_interleave(beam_size, dim=0)
# ic(pixel_value.shape) #[10, 3, 224, 224], 10 is beam size, 3 is channel, 224 is image size

for idx in range(max_length - 1): # We already generated one token
# ic(idx, '--------------------')
outputs = self.forward_pass(pixel_value, next_sequences)
next_token_logits = outputs['logits'][:, -1, :]
# ic(outputs['logits'].shape, outputs['logits']) #[2, 2, 13], batch, seq, vocab_size
# ic(next_token_logits.shape, next_token_logits)

# Apply temperature
# next_token_logits /= temperature

# Convert logits to probabilities and calculate new scores
next_token_probs = F.softmax(next_token_logits, dim=-1)
# ic(next_token_probs.shape, next_token_probs) #[2, 13], batch, vocab_size
next_token_scores = torch.log(next_token_probs)
# ic(next_token_scores.shape, next_token_scores) #[2, 13], batch, vocab_size

new_scores = sequence_scores.unsqueeze(1) + next_token_scores
# ic(sequence_scores.unsqueeze(1))
# ic(new_scores.shape, new_scores) #[2, 13], batch, vocab_size

# Select top k sequences
# ic(new_scores.view(-1), new_scores.view(-1).shape)
top_k_scores, top_k_indices = torch.topk(new_scores.view(-1), beam_size)

# ic(top_k_scores, top_k_indices)

# Get the beam and token that each of the top k sequences comes from
beams_indices = top_k_indices // self.cfg.num_tokens
token_indices = top_k_indices % self.cfg.num_tokens
# ic(beams_indices, token_indices)

# Update pixel values, sequences, and scores
# pixel_value = pixel_value[beams_indices]
# ic(next_sequences)
next_sequences = next_sequences[beams_indices]
# ic(next_sequences)
next_sequences = torch.cat([next_sequences, token_indices.unsqueeze(1)], dim=-1)
# ic(next_sequences)
sequence_scores = top_k_scores #/ (idx + 3) ** alpha

# ic('-------------------')
# if idx > 2: break

# Select the best sequence
max_score, max_score_idx = torch.max(sequence_scores, 0)
# Select the sequence with the highest score
best_sequence = next_sequences[max_score_idx]

# ic(best_sequence, max_score)
return best_sequence, max_score

..


This is portion of my class. 

There are omitted code especially forward_pass however the code will work properly if you adapt this carefully. 

And you can also capture some idea from here.

Thank you.

🙇🏻‍♂️

www.marearts.com


6/29/2023

Graph Neural Network Study Tutorial

 

Stanford CS224W Tutorials

https://data.pyg.org/img/cs224w_tutorials.png

The  Stanford CS224W course has collected a set of graph machine learning tutorial blog posts, fully realized with . Students worked on projects spanning all kinds of tasks, model architectures and applications. All tutorials also link to a  with the code in the tutorial for you to follow along with as you read it!

PyTorch Geometric Tutorial Project

The  PyTorch Geometric Tutorial project provides video tutorials and  Colab notebooks for a variety of different methods in :

  1. Introduction [ YouTube Colab]

  2.  basics [ YouTube Colab]

  3. Graph Attention Networks (GATs) [ YouTube Colab]

  4. Spectral Graph Convolutional Layers [ YouTube Colab]

  5. Aggregation Functions in GNNs [ YouTube Colab]

  6. (Variational) Graph Autoencoders (GAE and VGAE) [ YouTube Colab]

  7. Adversarially Regularized Graph Autoencoders (ARGA and ARGVA) [ YouTube Colab]

  8. Graph Generation [ YouTube]

  9. Recurrent Graph Neural Networks [ YouTube Colab (Part 1) Colab (Part 2)]

  10. DeepWalk and Node2Vec [ YouTube (Theory) YouTube (Practice) Colab]

  11. Edge analysis [ YouTube Colab (Link Prediction) Colab (Label Prediction)]

  12. Data handling in  (Part 1) [ YouTube Colab]

  13. Data handling in  (Part 2) [ YouTube Colab]

  14. MetaPath2vec [ YouTube Colab]

  15. Graph pooling (DiffPool) [ YouTube Colab]

6/02/2023

torch tensor padding example code:

 refer to code:


.

import torch
import torch.nn.functional as F

tensor = torch.randn(2, 3, 4) # Original tensor
print("Original tensor shape:", tensor.shape)

# Case 1: Pad the last dimension (dimension -1) -> resulting shape: [2, 3, 8]
padding_size = 4
padded_tensor = F.pad(tensor, (padding_size, 0)) # Add padding to the left of the last dimension
print("Case 1 tensor shape:", padded_tensor.shape)

# Case 2: Pad the second-to-last dimension (dimension -2) -> resulting shape: [2, 8, 4]
padding_size = 5
padded_tensor = F.pad(tensor, (0, 0, padding_size, 0)) # Add padding to the left of the second-to-last dimension
print("Case 2 tensor shape:", padded_tensor.shape)

# Case 3: Pad the first dimension (dimension 0) -> resulting shape: [7, 3, 4]
padding_size = 5
padded_tensor = F.pad(tensor, (0, 0, 0, 0, padding_size, 0)) # Add padding to the left of the first dimension
print("Case 3 tensor shape:", padded_tensor.shape)

..


www.marearts.com

Thank you. 🙇🏻‍♂️

5/23/2023

Create custom tokenizer simple code.

In the sample code, vocabulary is "0,1,2,3,4" and max length is 20.


.

from typing import List, Union

class CustomTokenizer:
def __init__(self, vocab: Union[str, List[str]], pad_token="<PAD>", cls_token="<BOS>", sep_token="<SEP>", max_len=20):
if isinstance(vocab, str):
with open(vocab, 'r') as f:
self.vocab = {word.strip(): i for i, word in enumerate(f.readlines())}
elif isinstance(vocab, list):
self.vocab = {word: i for i, word in enumerate(vocab)}
else:
raise ValueError("vocab must be either a filepath (str) or a list of words")
print('vocab: ', self.vocab)
self.pad_token = pad_token
self.cls_token = cls_token
self.sep_token = sep_token
self.max_len = max_len
self.inv_vocab = {v: k for k, v in self.vocab.items()}

def tokenize(self, text: str):
tokens = [c for c in text if c in self.vocab]
tokens = tokens[:self.max_len]
padding_length = self.max_len - len(tokens)
return [self.cls_token] + tokens + [self.sep_token] + [self.pad_token] * padding_length

def convert_tokens_to_ids(self, tokens):
return [self.vocab.get(token, self.vocab.get(self.pad_token)) for token in tokens]

def convert_ids_to_tokens(self, ids):
return [self.inv_vocab.get(id, self.pad_token) for id in ids]



vocab = ["<PAD>", "<BOS>", "<SEP>", "0", "1", "2", "3", "4"]
with open('vocab.txt', 'w') as f:
for token in vocab:
f.write(token + '\n')

# Initialize your custom tokenizer
tokenizer = CustomTokenizer(vocab='vocab.txt')

# Now you can use this tokenizer to tokenize your data, study.marearts.com
tokenized_text = tokenizer.tokenize('22342')
print("tokenized_text: ", tokenized_text)

# Convert tokens to ids
token_ids = tokenizer.convert_tokens_to_ids(tokenized_text)
print("token_ids: ", token_ids)

# Convert ids back to tokens, marearts.com
tokens = tokenizer.convert_ids_to_tokens(token_ids)
print("tokens: ", tokens)

..


Thank you.

🙇🏻‍♂️


5/02/2023

Yolo V7 vs V8

 

V7 vs V8 comparison

https://youtu.be/k1dOZFcLOek

https://youtu.be/tpOGDclq7KY

https://youtu.be/u5qxN2ACEP4

https://youtu.be/85SH08jN4dY

This is a comparison video between yolo v7 and v8.

Here is information for each version

Testing Computer :

  • Intel(R) Core(TM) i7-9800X CPU @ 3.80GHz
  • RTX 4090

Something might be useful code

  • yolo v8, video writer for detection result
import cv2
import time
from ultralytics import YOLO

def process_video(model, video_path, output_path):
cap = cv2.VideoCapture(video_path)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))

# Create a VideoWriter object to save the annotated video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

while cap.isOpened():
success, frame = cap.read()

if success:
start_time = time.time()
results = model(frame)
end_time = time.time()
processing_time = end_time - start_time
fps = 1/processing_time
# Visualize the results on the frame
annotated_frame = results[0].plot()
# Display the processing time on the annotated frame
cv2.putText(annotated_frame, f"Processing time: {processing_time:.4f} seconds / {fps:.4f} fps",
(10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

# Write the annotated frame to the output video
out.write(annotated_frame)

# cv2.imshow("YOLOv8 Inference", annotated_frame)
# if cv2.waitKey(1) & 0xFF == ord("q"):
# break
else:
break

cap.release()
out.release()

def main():
# Load the YOLO model
model = YOLO('yolov8x.pt')

# List of video files
video_paths = [
"../video/videoplayback-1.mp4",
"../video/videoplayback-2.mp4",
"../video/videoplayback-3.mp4",
"../video/videoplayback-4.mp4",
]

# Loop through video files and process them
for i, video_path in enumerate(video_paths):
output_path = f"../video/yolo_88_output_{i+1}.mp4"
process_video(model, video_path, output_path)

cv2.destroyAllWindows()

if __name__ == '__main__':
main()
  • make 2 video to side by side

Combine Two Videos Side by Side with OpenCV python

Thank you! 😺

4/25/2023

vit encoder + transformer decoder model - export onnx example

refer to this code:

.



# If you want to combine a Vision Transformer (ViT) as an encoder with a Transformer-based decoder,
# you can follow the steps below.
# We will use the Hugging Face Transformers library and PyTorch.

# Install the required libraries:
# pip install torch torchvision transformers onnx

# Define the combined model:
# -----------------------------------------
import torch
import torch.nn as nn
from transformers import ViTModel, ViTConfig, AutoModelForSeq2SeqLM

class ViTTransformer(nn.Module):
def __init__(self, vit_model, transformer_decoder):
super(ViTTransformer, self).__init__()
self.vit = vit_model
self.transformer_decoder = transformer_decoder

def forward(self, x, decoder_input_ids, **kwargs):
encoder_outputs = self.vit(x)
outputs = self.transformer_decoder(decoder_input_ids, encoder_outputs=encoder_outputs, **kwargs)
return outputs
# -----------------------------------------

# Load the ViT and Transformer decoder models:
# Assuming you have a pre-trained ViT model and a pre-trained Transformer decoder model, load them as follows:

# -----------------------------------------
vit_config = ViTConfig()
vit_model = ViTModel(vit_config)
transformer_decoder = AutoModelForSeq2SeqLM.from_pretrained("your-pretrained-transformer-decoder")


# Create the combined model and load the checkpoint if you have one:
# -----------------------------------------
combined_model = ViTTransformer(vit_model, transformer_decoder)
# -----------------------------------------

# # If you have a checkpoint, load it as follows:
# # checkpoint = torch.load('path/to/checkpoint.pth')
# # combined_model.load_state_dict(checkpoint['model_state_dict'])
# Export the combined model to ONNX format:
# The process of exporting the combined model to ONNX is more complicated due to the dynamic nature of the Transformer-based decoder.
# You might need to modify the export code depending on your specific use case.
# However, here is a general example:

# -----------------------------------------
# # Set the combined model to evaluation mode
combined_model.eval()
# Create dummy input tensors with the correct dimensions
# (B x C x H x W) for image input and (B x seq_len) for decoder input
dummy_image_input = torch.randn(1, 3, 224, 224)
dummy_decoder_input = torch.randint(0, transformer_decoder.config.vocab_size, (1, 5))

# Export the combined model to ONNX format
torch.onnx.export(
combined_model,
(dummy_image_input, dummy_decoder_input),
"vit_transformer.onnx",
input_names=["image_input", "decoder_input"],
output_names=["output"],
dynamic_axes={
"image_input": {0: "batch_size"},
"decoder_input": {0: "batch_size", 1: "sequence_length"},
"output": {0: "batch_size", 1: "sequence_length"},
},
opset_version=12,
)
# -----------------------------------------

# This code will create an ONNX file (vit_transformer.onnx) containing the combined ViT and Transformer decoder model.
# Note that you might need to adjust the code according to the specific needs of your application.

..

Thank you.🙇🏻‍♂️

2/17/2023

Overview of Image Retrieval Applications for Finding Images by Visual and Text Features

 Here are a few examples of image retrieval applications:

  1. Google Images: A popular image search engine that allows you to search for images using keywords and filters, such as color, size, and type. Google Images uses a combination of text and visual features to match images to search queries.

  2. TinEye: A reverse image search engine that allows you to find where an image appears online or to search for similar images based on visual features. TinEye uses image recognition technology to analyze the content of images and identify matches.

  3. Clarifai: An image and video recognition platform that allows you to search for images based on visual features such as color, texture, and object category, as well as text features such as captions and tags. Clarifai uses deep learning models to extract and analyze visual and textual features from images.

  4. Microsoft Bing Visual Search: A search engine that allows you to search for images using visual and text features, such as color, object category, and image similarity. Bing Visual Search uses deep learning models to analyze visual features and search algorithms to find similar images.

  5. Amazon Rekognition: An image and video analysis service that allows you to search for images based on visual features such as faces, objects, and scenes, as well as text features such as captions and tags. Amazon Rekognition uses deep learning models to extract and analyze visual and textual features from images.



    thank you.

    www.marearts.com

    🙇🏻‍♂️

2/16/2023

Use the ConvNext classifier layer example

 refer to code


..

import torch.nn as nn
import torchvision

# Load the pre-trained ConvNext model
model = torchvision.models.convnext_base(pretrained=True, stochastic_depth_prob=0.1, layer_scale=1e-4)

# Define a new linear layer with 10 output channels
new_linear_layer = nn.Linear(1024, 10)

# Replace the last linear layer in the classifier with the new one
classifier = model.classifier
classifier[-1] = new_linear_layer

# Set the modified classifier as the new classifier for the model
model.classifier = classifier

# Print the modified model architecture
print(model)

..



In this code, we first load the pre-trained ConvNext model using torchvision.models.convnext_base. 
We then define a new linear layer with 10 output channels using nn.Linear. 
Next, we extract the existing classifier from the model using model.classifier, and replace the last linear layer in the classifier with the new one using indexing ([-1]). 
Finally, we set the modified classifier as the new classifier for the ConvNext model using model.classifier = classifier.

This code should print out the modified model architecture, which will have a classifier layer that is identical to the original ConvNext classifier except for the number of output channels in the last linear layer.

Thank you.