5/30/2022

find optimal clustering number using silhouette evaluation

 

To Find optimal clustering number using silhouette metrics

It evaluate clustering resulting in every k number of KMean algorithm.

And show it as figure.

Lager value is better result.

..

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples
import numpy as np
import matplotlib.pyplot as plt
silhouette_vals = []
sk,ek = 2,20
for i in range(sk, ek):
kmeans_plus = KMeans(n_clusters=i, init='k-means++')
pred = kmeans_plus.fit_predict(cluster_df)
silhouette_vals.append(np.mean(silhouette_samples(cluster_df, pred, metric='euclidean')))
plt.plot(range(sk, ek), silhouette_vals, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Silhouette')
plt.show()

..

For example here, 20 k is best clustering result.


Thank you.


5/24/2022

t-SNE visualisation example code in Python

 

Refer to code


..


from sklearn.manifold import TSNE
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd


iris = load_iris()
x = iris.data
y = iris.target


# from sklearn.utils import shuffle
# x, y = shuffle(x, y)


tsne = TSNE(n_components=2, verbose=1, random_state=123)
z = tsne.fit_transform(x)


df = pd.DataFrame()
df["y"] = y
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]


sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
palette=sns.color_palette("hls", 3),
data=df).set(title="Iris data T-SNE projection")



..

output




5/19/2022

check ubuntu version on terminal

 

lsb_release -a



5/14/2022

pathlib, path, pathlib.PosixPath,

 

Make path using pathlib.

refer to code

..

from pathlib import Path
image_dir = 'dataset/images'
images = '1.png'
print(str(Path(image_dir) / images))
print( type(Path(image_dir) / images ))
# dataset/images/1.png
# <class 'pathlib.PosixPath'>

..


www.marearts.com


yaml to dict, adding argparse to dict (easydict, yaml)

 

Simple code to adapt and know how to read yaml and convert it to dict.

And one more thing is add argparse param to dict which is made from yaml.

We use easydict for this.

Refer to below code, then you would understand at a glance.


..


from easydict import EasyDict
import yaml

def load_setting(setting):
with open(setting, 'r') as f:
cfg = yaml.load(f, Loader=yaml.FullLoader)
return EasyDict(cfg)

#----------------------------
cfg = load_setting('test.yaml')
print(cfg)
#{'V1': 'abc', 'V2': {'sub': [1, 2, 3]}}
#----------------------------

#----------------------------
import argparse
paser = argparse.ArgumentParser()
args = paser.parse_args("")
args.batch_size=10
args.epoch=10
#----------------------------

cfg.update(vars(args))
print(cfg, type(cfg))
#{'V1': 'abc', 'V2': {'sub': [1, 2, 3]}, 'batch_size': 10, 'epoch': 10} <class 'easydict.EasyDict'>
#----------------------------

..


Thank you.

www.marearts.com


5/13/2022

convert simple transformer ner model to onnx

 

..

!python -m transformers.onnx --model=./checkpoint-21-epoch-11 --feature=token-classification onnx/

..



tokens to word, transformer

 

Refer to code to figure it out

how tokens consisted for a word.

Code show you tokens list for a word.


..

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("roberta-base")

example = "This is a tokenization example"

print('input sentence: ', example)
print('---')
print('tokens :')
print( tokenizer.encode(example, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) )
print('---')
print('word and tokens :')
print({x : tokenizer.encode(x, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) for x in example.split()})
print('---')
idx = 1
enc =[tokenizer.encode(x, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) for x in example.split()]
desired_output = []
for token in enc:
tokenoutput = []
for ids in token:
tokenoutput.append(idx)
idx +=1
desired_output.append(tokenoutput)

print('tokens in grouped list')
print(desired_output)
print('---')

..


input sentence:  This is a tokenization example
---
tokens :
[713, 16, 10, 19233, 1938, 1246]
---
word and tokens :
{'This': [713], 'is': [354], 'a': [102], 'tokenization': [46657, 1938], 'example': [46781]}
---
tokens in grouped list
[[1], [2], [3], [4, 5], [6]]
---


Thank you.
www.marearts.com

5/11/2022

python dict order shuffle

 


..

import random
d = {'a':1, 'b':2, 'c':3, 'd':4}
l = list(d.items())
random.shuffle(l)
d = dict(l)
print(d)

..

{'a': 1, 'c': 3, 'b': 2, 'd': 4}




5/09/2022

BERT Tokenizer, string to token, token to string

 

BERT Tokenizer token understanding examples

..

text = "I am e/mail"
# text = "I am a e-mail"
tokens = tokenizer.tokenize(text)
print(f'Tokens: {tokens}')
print(f'Tokens length: {len(tokens)}')
encoding = tokenizer.encode(text)
print(f'Encoding: {encoding}')
print(f'Encoding length: {len(encoding)}')
tok_text = tokenizer.convert_tokens_to_string(tokens)
print(f'token to string: {tok_text}')

..

output:

Tokens: ['I', 'Δ am', 'Δ e', '/', 'mail']
Tokens length: 5
Encoding: [0, 100, 524, 364, 73, 6380, 2]
Encoding length: 7
token to string: I am e/mail

--
Thank you.
www.marearts.com

5/04/2022

Python Convert List into a space-separated string

 

refer to code

..

lst = ['I', 'am', 'a', 'humen']
strlist = ' '.join(lst)
print(strlist, type(strlist))

..

I am a humen <class 'str'>

Thank you.

4/29/2022

simple example for EDA(Exploratory Data Analysis) using Tensorflow data validation

 

refer to this page for more detail

: https://www.tensorflow.org/tfx/data_validation/get_started

..

!pip install tensorflow_data_validation

import tensorflow_data_validation as tfdv
stats = tfdv.generate_statistics_from_tfrecord(data_location=path)
tfdv.visualize_statistics(stats)

..



Thank you.


4/22/2022

Measuring processing time python

 Measure processing time


..

#metho #1
import time
start = time.time()
print("hello")
end = time.time()
print(end - start)


#method #2
from timeit import default_timer as timer
start = timer()
# ...
end = timer()
print(end - start) # Time in seconds, e.g. 5.38091952400282

..


Thank you.

www.marearts.com


4/19/2022

Object of type float32 is not JSON serializable

 

Converting float to avoid error, refer to code:

..

import json

face_dict = {'x1': 240.54083251953125, 'y1': 470.02429199218744, 'x2': 479.535400390625, 'y2': 655.3250732421875, 'LeyeX': 382.76947021484375, 'LeyeY': 538.7545166015624, 'ReyeX': 383.48541259765625, 'ReyeY': 621.1448364257811, 'NoseX': 332.7269287109375, 'NoseY': 590.6889648437499, 'LlipsX': 300.84881591796875, 'LlipsY': 542.9485473632811, 'RlipsX': 301.3223876953125, 'RlipsY': 615.5052490234374, 'conf': 0.9999992}

data_convert = {k:float(v) for k,v in face_dict.items()}
with open('./data_convert.json', 'w') as fp:
json.dump(data_convert, fp, indent=5)


..


Another solution is:

..

with open('./data_convert.json', 'w') as fp:
json.dump(str(face_dict), fp, indent=5)

..

But in this case, json is saved as string.


Thank you.

www.marearts.com


rectangle, box face -> mosaic, pixelate

 

refer to mosaic function

..

def mosaic(img, rect, size):
(x1, y1, x2, y2) = rect
w = x2 - x1
h = y2 - y1
i_rect = img[y1:y2, x1:x2]
i_small = cv2.resize(i_rect, ( size, size))
i_mos = cv2.resize(i_small, (w, h), interpolation=cv2.INTER_AREA)
img2 = img.copy()
img2[y1:y2, x1:x2] = i_mos
return img2

#.... detect face
for face_dict in faces_dict:
x1,y1 = ( int(face_dict['x1']), int(face_dict['y1']))
x2,y2 = ( int(face_dict['x2']), int(face_dict['y2']))

# image = anonymize_face_pixelate(image[y1:y2, x1:x2, :] , blocks=3)
image = mosaic(image, (x1, y1, x2, y2), 10 )
#.... face mosaic

..


Ex) result


Thank you.

www.marearts.com


Remove duplicated dict element in list, remove same dict element in tow list

 refer to code:


..

a=[{'a':1, 'b':3}, {'a':2, 'b':4}]
b=[{'a':3, 'b':3}, {'a':2, 'b':4}]
a.extend(b)
[dict(t) for t in {tuple(d.items()) for d in a}]

>> [{'a': 1, 'b': 3}, {'a': 2, 'b': 4}, {'a': 3, 'b': 3}]

..


Thank you.

www.marearts.com

python calculate intersect of union area between two box

 refer to code:

..

def IoU(box1, box2):
# box = (x1, y1, x2, y2)
box1_area = (box1[2] - box1[0] + 1) * (box1[3] - box1[1] + 1)
box2_area = (box2[2] - box2[0] + 1) * (box2[3] - box2[1] + 1)

# obtain x1, y1, x2, y2 of the intersection
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
x11, y11, x12, y12 = box1
x21, y21, x22, y22 = box2
if x21<x11 and y21<y11 and x22>x12 and y22>y12:
return 1.0
if x21>x11 and y21>y11 and x22<x12 and y22<y12:
return 1.0
# compute the width and height of the intersection
w = max(0, x2 - x1 + 1)
h = max(0, y2 - y1 + 1)

inter = w * h
iou = inter / (box1_area + box2_area - inter)
return iou

..


Thank you

www.marearts.com


4/17/2022

python yaml to dict

 refer to code

..

import yaml
with open('hparams.yaml', 'r') as stream:
try:
parsed_yaml=yaml.safe_load(stream)
print(parsed_yaml, type(parsed_yaml))
except yaml.YAMLError as exc:
print(exc)

..

> example output

{'batch_size': 1000, 'data_path': ['../npy/x_train_coin_eth.npy', '../npy/y_train_coin_eth.npy', '../npy/x_val_coin_eth.npy', '../npy/y_val_coin_eth.npy', '../npy/x_test_coin_eth.npy', '../npy/y_test_coin_eth.npy'], 'encoder_hidden_dims': [4, 2], 'input_dim': 5, 'input_seq': 128, 'learning_rate': 0.0001, 'num_LSTM': 2, 'num_layers': 1} <class 'dict'>


Thank you.
www.marearts.com


'your model' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

 


I tired to save sub model in seq-to-seq model which is encoder part.

What I used for save and load is like follow code and I failed with error like title.

* Failed case

..

torch.save(model.lstm_auto_model.lstm_encoder.lstms, 'idx-78-encoder_lstm_encoder_lstms.mare')

torch.load(lstm_abyss_model.lstm_encoder, 'idx-78-encoder_lstm_lstm_encoder.mare')

..

* error message

AttributeError: 'LSTM' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

..


My solution is to use 'state_dict()' to save model.

Refer to bellow code which was solution for me.

..

torch.save(model.lstm_auto_model.lstm_encoder.lstms.state_dict(), 'idx-78-encoder_lstm_encoder_lstms_dict.mare')

lstm_abyss_model.lstm_encoder.lstms.load_state_dict(torch.load('idx-78-encoder_lstm_encoder_lstms_dict.mare'))

..


Thank you.

www.marearts.com



4/16/2022

opencv, cv2, rotate image & point

 

refer to code.

sample point is 100,100

firstly, I rotated image as 90 degree and point.

And compare origin image & pt where rotated point is placed in right position.

..

import numpy as np
import cv2
from matplotlib import pyplot as plt

def rotate_image(angle, width, height):
image_center = (width/2, height/2)
rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.)
abs_cos = abs(rotation_mat[0,0])
abs_sin = abs(rotation_mat[0,1])
bound_w = int(height * abs_sin + width * abs_cos)
bound_h = int(height * abs_cos + width * abs_sin)
rotation_mat[0, 2] += bound_w/2 - image_center[0]
rotation_mat[1, 2] += bound_h/2 - image_center[1]
return rotation_mat, (bound_w, bound_h)

#open image
image = cv2.imread('img.png')
sample_pt =(100,100)

#get rotate mat, bound
height, width = image.shape[:2]
rotation_mat, (bound_w, bound_h) = rotate_image(90, width, height)

#rotate image
image_90 = cv2.warpAffine(image, rotation_mat, (bound_w, bound_h))
#rotate pt
sample = np.array([100, 100, 1])
sample_90 = np.matmul(rotation_mat, sample)


#origin image & pt
cv2.circle(image, sample_pt, 1, (0, 0, 255), 2, cv2.LINE_AA)

#rotated image & pt
cv2.circle(image_90, (int(sample_90[0]), int(sample_90[1])), 1, (0, 0, 255), 2, cv2.LINE_AA)


image = image[:,:,::-1]
plt.figure(figsize=(10, 10), dpi=100)
plt.imshow(image)

image_90 = image_90[:,:,::-1]
plt.figure(figsize=(10, 10), dpi=100)
plt.imshow(image_90)

..


origin image with point

90 rotated image and point

Thank you.

www.marearts.com


4/11/2022

LSTM Autoencoder pytorch

 

Code 

...

import torch
import torch.nn as nn
from torchinfo import summary
import copy

class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dims, num_layers, num_LSTM):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.hidden_dims = hidden_dims
self.num_layers = num_layers
LSTMs=[]
fDim = self.input_dim
for i in range(num_LSTM):
LSTMs.append( nn.LSTM(input_size=fDim, hidden_size=hidden_dims[i], num_layers=self.num_layers, batch_first=True) )
fDim = hidden_dims[i]
self.lstms = nn.ModuleList(LSTMs)
def forward(self, x):

for i, lstm in enumerate(self.lstms):
lstm_out, (hidden_out, cell_out) = lstm(x)
x = lstm_out
last_sequence_hidden_dim = x[:,-1,:] #lstm_out[:,-1,:]
return x, last_sequence_hidden_dim

class regressor(nn.Module):
def __init__(self, input_dim, output_dim, dropout=0.1):
super(regressor, self).__init__()
self.input_dim = input_dim
self.output_dim = output_dim
self.dropout = dropout

self.regressor = self.make_regressor()
def make_regressor(self):
layers = []
layers.append(nn.Dropout(self.dropout))
layers.append(nn.Linear(self.input_dim, self.input_dim // 2))
layers.append(nn.ReLU())
layers.append(nn.Linear(self.input_dim // 2, self.output_dim))
regressor = nn.Sequential(*layers)
return regressor
def forward(self,x):
x = self.regressor(x)
return x

class LSTM_autoencoder(nn.Module):
def __init__(self, input_dim, encoder_hidden_dims, num_layers, num_LSTM, input_seq):
super(LSTM_autoencoder, self).__init__()

self.input_dim = input_dim #5
self.encoder_hidden_dims = copy.deepcopy(encoder_hidden_dims) #[256, 128, 64]
encoder_hidden_dims.reverse()
self.decoder_hidden_dims = copy.deepcopy(encoder_hidden_dims) #[64, 128, 256]
self.num_layers = num_layers #2
self.num_LSTM = num_LSTM #3
self.input_seq = input_seq

#LSTM model encoder
self.lstm_encoder = LSTM(input_dim, self.encoder_hidden_dims, num_layers, num_LSTM)
#LSTM model decoder
self.lstm_decoder = LSTM(self.decoder_hidden_dims[0], self.decoder_hidden_dims, num_layers, num_LSTM)
#LSTM regressor model
self.lstm_regressor = regressor(self.encoder_hidden_dims[0], input_dim)
def forward(self, x):
input_encoder=x
_, output_encoder = self.lstm_encoder(input_encoder)
print(f'1 - lstm encoder input:{input_encoder.shape} output:{output_encoder.shape}')
x_inter = torch.unsqueeze(output_encoder, 1)
intput_decoder = x_inter.repeat(1, self.input_seq, 1)
print(f'2 - input_decoder: {intput_decoder.shape}')
output_decoder, _ = self.lstm_decoder(intput_decoder)
print(f'3 - input decoder: {intput_decoder.shape} output decoder:{output_decoder.shape}')

output_regressor = self.lstm_regressor(output_decoder)
print(f'4 - output_regressor input: {output_decoder.shape} output decoder:{output_regressor.shape}')
return output_regressor

...


Test class and show summary

..

input_dim = 5
num_LSTM = 2
encoder_hidden_dims = [256, 128]
num_layers = 2
input_seq = 140
batch_size=100

lstm_auto_model = LSTM_autoencoder(input_dim, encoder_hidden_dims, num_layers, num_LSTM, input_seq)
summary(lstm_auto_model, input_size=(batch_size, input_seq, input_dim))

..

output



..


Refer to my ugly drawing


Thank you.

www.marearts.com


time-distributed dense (TDD, TimeDistributed) layer in PyTorch

 

refer to code:

..

import torch
m = torch.nn.Linear(256, 5)
#batch, sequence(time), dim
input = torch.randn(100, 140, 256)
output = m(input) #100, 140, 256 -> 100, 140, 5
print(output.size())
#torch.Size([100, 140, 5])

..


Thank you.

www.marearts.com

torch repeat example

 Refer to code

..

batch_size=100
input_seq=140
input_dim=5
rand_input = torch.rand(batch_size, input_seq, input_dim)

repeat_output = rand_input.repeat(1, 1, 1)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

repeat_output = rand_input.repeat(1, 10, 1)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

repeat_output = rand_input.repeat(1, 1, 10)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

repeat_output = rand_input.repeat(10, 1, 1)
print(f'input :{rand_input.shape}, repeat:{repeat_output.shape}')

..

>>

input :torch.Size([100, 140, 5]), repeat:torch.Size([100, 140, 5])
input :torch.Size([100, 140, 5]), repeat:torch.Size([100, 1400, 5])
input :torch.Size([100, 140, 5]), repeat:torch.Size([100, 140, 50])
input :torch.Size([100, 140, 5]), repeat:torch.Size([1000, 140, 5])



Thank you.
www.marearts.com



4/10/2022

pytorch module list example

 

ex1)

linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

..


ex2)

linears=[]
for i in range(10):
linears.append( nn.Linear(10, 10) )
nn.ModuleList(linears)

..



print module list

>

ModuleList(
  (0): Linear(in_features=10, out_features=10, bias=True)
  (1): Linear(in_features=10, out_features=10, bias=True)
  (2): Linear(in_features=10, out_features=10, bias=True)
  (3): Linear(in_features=10, out_features=10, bias=True)
  (4): Linear(in_features=10, out_features=10, bias=True)
  (5): Linear(in_features=10, out_features=10, bias=True)
  (6): Linear(in_features=10, out_features=10, bias=True)
  (7): Linear(in_features=10, out_features=10, bias=True)
  (8): Linear(in_features=10, out_features=10, bias=True)
  (9): Linear(in_features=10, out_features=10, bias=True)
)