Showing posts with label pytorch. Show all posts
Showing posts with label pytorch. Show all posts

9/16/2024

Pytorch model to mlir -> llvm -> executable file on Mac book m1


# Step 1: Define and train a simple PyTorch CNN model



import torch

import torch.nn as nn

import torch.optim as optim

import torchvision

import torchvision.transforms as transforms



# Define a simple CNN

class SimpleCNN(nn.Module):

def __init__(self):

super(SimpleCNN, self).__init__()

self.conv1 = nn.Conv2d(1, 32, 3, 1)

self.conv2 = nn.Conv2d(32, 64, 3, 1)

self.dropout1 = nn.Dropout2d(0.25)

self.dropout2 = nn.Dropout2d(0.5)

self.fc1 = nn.Linear(9216, 128)

self.fc2 = nn.Linear(128, 10)



def forward(self, x):

x = self.conv1(x)

x = nn.functional.relu(x)

x = self.conv2(x)

x = nn.functional.relu(x)

x = nn.functional.max_pool2d(x, 2)

x = self.dropout1(x)

x = torch.flatten(x, 1)

x = self.fc1(x)

x = nn.functional.relu(x)

x = self.dropout2(x)

x = self.fc2(x)

output = nn.functional.log_softmax(x, dim=1)

return output



# Train the model (simplified for brevity)

model = SimpleCNN()

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters())



# Assume we've trained the model...



# Save the trained model

torch.save(model.state_dict(), "simple_cnn.pth")



# Step 2: Compile the model with torch-mlir



import torch_mlir



# Load the trained model

model = SimpleCNN()

model.load_state_dict(torch.load("simple_cnn.pth"))

model.eval()



# Create an example input tensor

example_input = torch.randn(1, 1, 28, 28)



# Compile the model to MLIR

mlir_module = torch_mlir.compile(model, example_input, output_type="linalg-on-tensors")



# Save the MLIR module to a file

with open("simple_cnn.mlir", "w") as f:

f.write(str(mlir_module))



# Step 3: Lower MLIR to LLVM IR

# This step typically requires using the MLIR tools from the command line



# mlir-opt simple_cnn.mlir --convert-linalg-to-loops --convert-scf-to-cf --convert-vector-to-llvm --convert-memref-to-llvm --convert-func-to-llvm --reconcile-unrealized-casts | mlir-translate --mlir-to-llvmir > simple_cnn.ll



# Step 4: Compile LLVM IR to machine code

# Use Clang to compile for M1 Mac (arm64 architecture)



# clang -O3 -march=arm64 simple_cnn.ll -o simple_cnn_exec



# The result is an executable file named 'simple_cnn_exec'



# Step 5 (optional): Create a C++ wrapper to use the compiled model



#include <iostream>

#include <vector>



// Declare the function generated from our PyTorch model

extern "C" void simple_cnn(float* input, float* output);



int main() {

// Prepare input (28x28 image flattened to 1D array)

std::vector<float> input(784, 0.0f); // Initialize with zeros for simplicity


// Prepare output (10 classes for MNIST)

std::vector<float> output(10, 0.0f);


// Call the compiled model

simple_cnn(input.data(), output.data());


// Print the output (class probabilities)

for (int i = 0; i < 10; ++i) {

std::cout << "Class " << i << " probability: " << output[i] << std::endl;

}


return 0;

}



# Compile the C++ wrapper with the compiled model

# clang++ -O3 wrapper.cpp simple_cnn_exec -o final_executable

1/30/2024

checking torch + cuda installed correctly

 

 

Run this script 

.

 

import torch
from torch.utils.cpp_extension import CUDAExtension, BuildExtension

def check_cuda_setup():
cuda_available = torch.cuda.is_available()
print(f"CUDA available: {cuda_available}")

if cuda_available:
cuda_version = torch.version.cuda
print(f"CUDA version (PyTorch): {cuda_version}")

try:
# Attempt to create a CUDA extension
ext = CUDAExtension(
name='test_ext',
sources=[]
)
print("CUDAExtension can be created successfully.")
except Exception as e:
print(f"Error creating CUDAExtension: {e}")

try:
# Attempt to create a BuildExtension object
build_ext = BuildExtension()
print("BuildExtension can be created successfully.")
except Exception as e:
print(f"Error creating BuildExtension: {e}")

if __name__ == "__main__":
check_cuda_setup()


..

If return 'False' then you need to fix your system.

Thank you.


6/02/2023

torch tensor padding example code:

 refer to code:


.

import torch
import torch.nn.functional as F

tensor = torch.randn(2, 3, 4) # Original tensor
print("Original tensor shape:", tensor.shape)

# Case 1: Pad the last dimension (dimension -1) -> resulting shape: [2, 3, 8]
padding_size = 4
padded_tensor = F.pad(tensor, (padding_size, 0)) # Add padding to the left of the last dimension
print("Case 1 tensor shape:", padded_tensor.shape)

# Case 2: Pad the second-to-last dimension (dimension -2) -> resulting shape: [2, 8, 4]
padding_size = 5
padded_tensor = F.pad(tensor, (0, 0, padding_size, 0)) # Add padding to the left of the second-to-last dimension
print("Case 2 tensor shape:", padded_tensor.shape)

# Case 3: Pad the first dimension (dimension 0) -> resulting shape: [7, 3, 4]
padding_size = 5
padded_tensor = F.pad(tensor, (0, 0, 0, 0, padding_size, 0)) # Add padding to the left of the first dimension
print("Case 3 tensor shape:", padded_tensor.shape)

..


www.marearts.com

Thank you. 🙇🏻‍♂️

2/17/2023

How to print the full contents of a PyTorch tensor

 refer to code:


..

import torch

# create a tensor
x = torch.randn(3, 4)

# set print options to display full tensor
torch.set_printoptions(precision=10, threshold=None, edgeitems=None, linewidth=None, profile=None)

# print the full tensor
print(x)

..

In this example, we set the precision to 10 to display up to 10 decimal places, and set threshold, edgeitems, linewidth, and profile to None to display all the elements of the tensor. You can adjust these settings to your preference, depending on the size and precision of your tensor.



Thank you. 
🙇🏻‍♂️
www.marearts.com

1/25/2023

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

 

I was suffering above (title) error during few hours.

The reason is wrong cuda version installed with pytorch.

My cuda version is 11.6, but install version is 11.7.


So I print it -> "torch.__version__"

It returend -> "1.13.1+cu117"


You can check your cuda version using this command

> nvidia-smi


so, remove all torch, torchvision packaged

> pip uninstall torch torchvision

and install again

ex)

pip3 install torch torchvision torchaudio --extra-index-url \https://download.pytorch.org/whl/cu116


Then it works well.


Thank you.

🙇🏻‍♂️

1/10/2023

SSLCertVerificationError when downloading pytorch model or datasets via torchvision

 

I tried to download resnet101 model via torchvision model 

ex) torchvision.models.resnet101(pretrained=True)

But it has such a error

---------------------------------------------------------------------------
SSLCertVerificationError                  Traceback (most recent call last)
F........


This line would solve this issue :

..

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

..


Thank you.


www.marearts.com

4/17/2022

'your model' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

 


I tired to save sub model in seq-to-seq model which is encoder part.

What I used for save and load is like follow code and I failed with error like title.

* Failed case

..

torch.save(model.lstm_auto_model.lstm_encoder.lstms, 'idx-78-encoder_lstm_encoder_lstms.mare')

torch.load(lstm_abyss_model.lstm_encoder, 'idx-78-encoder_lstm_lstm_encoder.mare')

..

* error message

AttributeError: 'LSTM' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

..


My solution is to use 'state_dict()' to save model.

Refer to bellow code which was solution for me.

..

torch.save(model.lstm_auto_model.lstm_encoder.lstms.state_dict(), 'idx-78-encoder_lstm_encoder_lstms_dict.mare')

lstm_abyss_model.lstm_encoder.lstms.load_state_dict(torch.load('idx-78-encoder_lstm_encoder_lstms_dict.mare'))

..


Thank you.

www.marearts.com



4/11/2022

LSTM Autoencoder pytorch

 

Code 

...

import torch
import torch.nn as nn
from torchinfo import summary
import copy

class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dims, num_layers, num_LSTM):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.hidden_dims = hidden_dims
self.num_layers = num_layers
LSTMs=[]
fDim = self.input_dim
for i in range(num_LSTM):
LSTMs.append( nn.LSTM(input_size=fDim, hidden_size=hidden_dims[i], num_layers=self.num_layers, batch_first=True) )
fDim = hidden_dims[i]
self.lstms = nn.ModuleList(LSTMs)
def forward(self, x):

for i, lstm in enumerate(self.lstms):
lstm_out, (hidden_out, cell_out) = lstm(x)
x = lstm_out
last_sequence_hidden_dim = x[:,-1,:] #lstm_out[:,-1,:]
return x, last_sequence_hidden_dim

class regressor(nn.Module):
def __init__(self, input_dim, output_dim, dropout=0.1):
super(regressor, self).__init__()
self.input_dim = input_dim
self.output_dim = output_dim
self.dropout = dropout

self.regressor = self.make_regressor()
def make_regressor(self):
layers = []
layers.append(nn.Dropout(self.dropout))
layers.append(nn.Linear(self.input_dim, self.input_dim // 2))
layers.append(nn.ReLU())
layers.append(nn.Linear(self.input_dim // 2, self.output_dim))
regressor = nn.Sequential(*layers)
return regressor
def forward(self,x):
x = self.regressor(x)
return x

class LSTM_autoencoder(nn.Module):
def __init__(self, input_dim, encoder_hidden_dims, num_layers, num_LSTM, input_seq):
super(LSTM_autoencoder, self).__init__()

self.input_dim = input_dim #5
self.encoder_hidden_dims = copy.deepcopy(encoder_hidden_dims) #[256, 128, 64]
encoder_hidden_dims.reverse()
self.decoder_hidden_dims = copy.deepcopy(encoder_hidden_dims) #[64, 128, 256]
self.num_layers = num_layers #2
self.num_LSTM = num_LSTM #3
self.input_seq = input_seq

#LSTM model encoder
self.lstm_encoder = LSTM(input_dim, self.encoder_hidden_dims, num_layers, num_LSTM)
#LSTM model decoder
self.lstm_decoder = LSTM(self.decoder_hidden_dims[0], self.decoder_hidden_dims, num_layers, num_LSTM)
#LSTM regressor model
self.lstm_regressor = regressor(self.encoder_hidden_dims[0], input_dim)
def forward(self, x):
input_encoder=x
_, output_encoder = self.lstm_encoder(input_encoder)
print(f'1 - lstm encoder input:{input_encoder.shape} output:{output_encoder.shape}')
x_inter = torch.unsqueeze(output_encoder, 1)
intput_decoder = x_inter.repeat(1, self.input_seq, 1)
print(f'2 - input_decoder: {intput_decoder.shape}')
output_decoder, _ = self.lstm_decoder(intput_decoder)
print(f'3 - input decoder: {intput_decoder.shape} output decoder:{output_decoder.shape}')

output_regressor = self.lstm_regressor(output_decoder)
print(f'4 - output_regressor input: {output_decoder.shape} output decoder:{output_regressor.shape}')
return output_regressor

...


Test class and show summary

..

input_dim = 5
num_LSTM = 2
encoder_hidden_dims = [256, 128]
num_layers = 2
input_seq = 140
batch_size=100

lstm_auto_model = LSTM_autoencoder(input_dim, encoder_hidden_dims, num_layers, num_LSTM, input_seq)
summary(lstm_auto_model, input_size=(batch_size, input_seq, input_dim))

..

output



..


Refer to my ugly drawing


Thank you.

www.marearts.com


time-distributed dense (TDD, TimeDistributed) layer in PyTorch

 

refer to code:

..

import torch
m = torch.nn.Linear(256, 5)
#batch, sequence(time), dim
input = torch.randn(100, 140, 256)
output = m(input) #100, 140, 256 -> 100, 140, 5
print(output.size())
#torch.Size([100, 140, 5])

..


Thank you.

www.marearts.com

torch repeat example

 Refer to code

..

batch_size=100
input_seq=140
input_dim=5
rand_input = torch.rand(batch_size, input_seq, input_dim)

repeat_output = rand_input.repeat(1, 1, 1)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

repeat_output = rand_input.repeat(1, 10, 1)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

repeat_output = rand_input.repeat(1, 1, 10)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

repeat_output = rand_input.repeat(10, 1, 1)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

..

>>

input :torch.Size([100, 140, 5]), repeat:torch.Size([100, 140, 5])
input :torch.Size([100, 140, 5]), repeat:torch.Size([100, 1400, 5])
input :torch.Size([100, 140, 5]), repeat:torch.Size([100, 140, 50])
input :torch.Size([100, 140, 5]), repeat:torch.Size([1000, 140, 5])



Thank you.
www.marearts.com



4/10/2022

pytorch module list example

 

ex1)

linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

..


ex2)

linears=[]
for i in range(10):
linears.append( nn.Linear(10, 10) )
nn.ModuleList(linears)

..



print module list

>

ModuleList(
  (0): Linear(in_features=10, out_features=10, bias=True)
  (1): Linear(in_features=10, out_features=10, bias=True)
  (2): Linear(in_features=10, out_features=10, bias=True)
  (3): Linear(in_features=10, out_features=10, bias=True)
  (4): Linear(in_features=10, out_features=10, bias=True)
  (5): Linear(in_features=10, out_features=10, bias=True)
  (6): Linear(in_features=10, out_features=10, bias=True)
  (7): Linear(in_features=10, out_features=10, bias=True)
  (8): Linear(in_features=10, out_features=10, bias=True)
  (9): Linear(in_features=10, out_features=10, bias=True)
)



4/08/2022

Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance

 

Try #1

from pytorch_lightning.strategies.ddp import DDPStrategy
trainer = pl.Trainer(
strategy = DDPStrategy(find_unused_parameters=False),
accelerator = 'gpu',
devices = 3
)

..

Try #2

from pytorch_lightning.plugins import DDPPlugin

trainer = pl.Trainer(
val_check_interval=0.1,
gpus=-1,
accelerator="ddp",
callbacks=[checkpoint_callback, early_stop_callback],
plugins=DDPPlugin(find_unused_parameters=False),
precision=16,
)

..


Thank you.


2/25/2022

list tensor to batch tensor, Pytorch

 4 length list tensor -> 4 x 200 x 200 x 3 tensor

--

print('---- input list tensor')
print('length: ', len(list_torch) )
for g in list_torch:
print(g.shape)
print('---- convert batch tensor')
b = torch.Tensor(4, 200, 200, 3)
torch_batch = torch.cat(grid, out=b)
print(torch_batch.shape)
print('----')

--

output

---- input list tensor
length: 4
torch.Size([1, 200, 200, 3])
torch.Size([1, 200, 200, 3])
torch.Size([1, 200, 200, 3])
torch.Size([1, 200, 200, 3])
---- convert batch tensor
torch.Size([4, 200, 200, 3])
----

--


Thank you.

www.marearts.com

🙇🏻‍♂️

12/02/2021

How to combine multiple criterions to a loss function? Multiple loss function for single model.

You can simply reference below code:


ex1)

b = nn.MSELoss()(output_x, x_labels) a = nn.CrossEntropyLoss()(output_y, y_labels) loss = a + b loss.backward()


ex2)

b = nn.MSELoss() a = nn.CrossEntropyLoss() loss_a = a(output_x, x_labels) loss_b = b(output_y, y_labels) loss = loss_a + loss_b loss.backward()


And there are many opinions in here:

https://discuss.pytorch.org/t/how-to-combine-multiple-criterions-to-a-loss-function/348/27


Thank you.

www.marearts.com

🙇🏻‍♂️

9/22/2021

pytorch cuda definition

Their syntax varies slightly, but they are equivalent:

.to(name)

.to(device)

.cuda()

CPU

to('cpu')

to(torch.device('cpu'))

cpu()

Current GPU

to('cuda')

to(torch.device('cuda'))

cuda()

Specific GPU

to('cuda:1')

to(torch.device('cuda:1'))

cuda(device=1)

Note: the current cuda device is 0 by default, but this can be set with torch.cuda.set_device().

7/24/2021

check pytorch, Tensorflow can use GPU

 

test tensorflow which can use GPU

#method 1
import tensorflow as tf
tf.test.is_built_with_cuda()
> Ture

#method 2
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
> ..

test pytorch can use GPU

#method 3
import torch
torch.cuda.is_available()
>>> True

torch.cuda.current_device()
>>> 0

torch.cuda.device(0)
>>> <torch.cuda.device at 0x7efce0b03be0>

torch.cuda.device_count()
>>> 1

torch.cuda.get_device_name(0)
>>> 'GeForce GTX 950M'



Thank you.
www.MareArts.com

9/23/2020

Pytorch, Infinite DataLoader using iter & next

 


# create dataloader-iterator
data_iter = iter(data_loader)

# iterate over dataset
# alternatively you could use while(True)
for i in range(NUM_ITERS_YOU_WANT)
try:
data = next(data_iter)
except StopIteration:
# StopIteration is thrown if dataset ends
# reinitialize data loader
data_iter = iter(data_loader)
data = next(data_iter)

8/20/2020

RuntimeError: set_sizes_contiguous is not allowed on Tensor created from .data or .detach(), in Pytorch 1.1.0

change old -> new 


old

v.data.resize_(data.size()).copy_(data)


NEW

with torch.no_grad():
    v.resize_(data.size()).copy_(data)