MareArts Computer Vision Study.: 2022

12/15/2022

python example split string only using re (regular expression)

refer to code

import re
target_string = "TOIMITUSKULJETTAJA Työpaikka Contracting & Procurement (REMOTE)12 45 78 Kuljetusauto (RAC.Transporter Car..Operations.Nonex) Työpaikka Dallasiss"
# split on white-space 
word_list = re.split(r'[`\-=~!@#$%^&*()_+\[\]{};\'\\:"|<,./<>? ]', target_string)
#remove '' str
word_list = [i for i in word_list if i]
print(word_list, type(word_list))

output

['TOIMITUSKULJETTAJA', 'Työpaikka', 'Contracting', 'Procurement', 'REMOTE', '12', '45', '78', 'Kuljetusauto', 'RAC', 'Transporter', 'Car', 'Operations', 'Nonex', 'Työpaikka', 'Dallasiss'] <class 'list'>

Thank you.

🙇🏻‍♂️

www.marearts.com

12/13/2022

search all certain files recursively including subfolder - python example

refer to code

from pathlib import Path
mypath = "folder1/folder2"
for path in Path(annotation_addr).rglob('*.json'):
    print(path) #full path
    print(path.name) #file name

Thank you.

www.marearts.com

🙇🏻‍♂️

12/12/2022

opencv c++ resizeWindow example

Original image size [5120x2188], but you can imshow as 300x300 and resizable.

#include "opencv2/opencv.hpp"
#include "opencv2/highgui/highgui.hpp"  
#include <iostream>
#include <chrono>
 
#pragma comment(lib, "opencv_highgui453.lib")
#pragma comment(lib, "opencv_imgcodecs453.lib")
#pragma comment(lib, "opencv_core453.lib")
 
int main() {
	//read image
	cv::Mat oImg = cv::imread("Resized_Blackpink_PUBG_210321.jpg");
	cv::namedWindow("oimg", cv::WINDOW_NORMAL);
	cv::resizeWindow("oimg", 300, 300);
	cv::imshow("oimg", oImg);
	cv::waitKey(0);
}

Thank you.

11/07/2022

python text read line by line

refer to code

fn = './text.txt'
# Using readlines()
with open(fn, 'r') as fp:
    Lines = fp.readlines()
    
    count = 0
    for line in Lines:
        count += 1
        print("{}: {}".format(count, line.strip()))

Thank you.

www.marearts.com

10/28/2022

OneCycle LR set in Pytorch lightning

Add configure_optimizer member function in pytorch lightning model class.

refer to code:

    def configure_optimizers(self):
        optimizer = getattr(torch.optim, self.cfg.optimizer)
        self.optimizer = optimizer(self.parameters(), lr=float(self.cfg.lr))

        total_bs = int(self.cfg.train_dataloader_len / self.cfg.gpus)

        
        epochs = self.cfg.epochs
        self.scheduler = torch.optim.lr_scheduler.OneCycleLR(
            self.optimizer, max_lr=self.cfg.lr,
            anneal_strategy='linear', div_factor=100,
            steps_per_epoch=total_bs, pct_start=(1/self.cfg.epochs),
            epochs=epochs)
            
        sched = {
            'scheduler': self.scheduler,
            'interval': 'step',
            }
            
        return [self.optimizer], [sched]

Thank you. 🙇🏻‍♂️

www.marearts.com

python list reverse

refer to code:

Test_list = [1, 2, 3, 4, 5]
Test_list.reverse()
print(Test_list)  # Output:  [5,4,3,2,1]

Thank you.

10/20/2022

re-install nvidia drive (cuda) in ubuntu

command it on terminal

sudo apt clean
sudo apt update
sudo apt purge nvidia-* 
sudo apt autoremove
sudo apt install -y cuda

Thank you.

www.marearts.com

10/17/2022

notion embedding iframe html script

refer to this page:

https://www.notion.so/Create-a-widget-in-5-minutes-865211938cfb4ef2aad53d67da476736

Thank you.

10/14/2022

pytorch lightning set validation interval

refer to ex:

# default used by the Trainer

trainer = Trainer(val_check_interval=1.0)

# check validation set 4 times during a training epoch

trainer = Trainer(val_check_interval=0.25)

# check validation set every 1000 training batches in the current epoch

trainer = Trainer(val_check_interval=1000)

detail is here: https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html

www.marearts.com

10/13/2022

python load json file

# importing the module
import json
# Opening JSON file
with open(FileA) as json_file:
    data = json.load(json_file)
    print("Type:", type(data))
print('length: ', len(data))
print('first item type: ', type(data[0]))

Thank you.

www.marearts.com

10/07/2022

python, pandas, to create empty data frame with same header from other df.

refer to code.

import pandas as pd
data_df = pd.read_csv('train_data.tsv', delimiter='\t')
col = list(data_df.columns)

#make empty pandas with same header
empty_df = pd.DataFrame(columns=col)
print(empty_df)

Thank you.

www.marearts.com

10/02/2022

python os mkdir example code

if not os.path.exists(output_dir):
    os.mkdir(output_dir)

Thank you.

www.marearts.com

9/24/2022

torch np cuda seed random everything

refer to below code:

import numpy as np
import os
import random
import torch

SEED = 42

def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True

seed_everything(SEED)

Thank you.

www.marearts.com

9/23/2022

c++ 'time_t' to 'yyyy mm dd', 'tm' to 'time_t' , example source code

refer to bloew example code:

//current day of second from 1900.01.01
    time_t current_time;
    current_time = time(NULL);
    printf("%ld seconds since January 1, 1900\n", current_time);

    //time_t to yyyy mm dd
    char buffer[80];
    strftime(buffer, 20, "%Y-%m-%d %H:%M:%S", localtime(&current_time));
    printf("%s", buffer);

    //make timt_t from custome yyyy mm dd
    struct tm s_tm = { 0 };
    s_tm.tm_year = 2022 - 1900;
    s_tm.tm_mon = 9 - 1;
    s_tm.tm_mday = 23;
    time_t s_tmt = mktime(&s_tm);
    printf("s_tmt %ld \n", s_tmt);

    /*
    struct tm {
        int tm_sec;         // seconds 
        int tm_min;         // minutes 
        int tm_hour;        // hours 
        int tm_mday;        // day of the month 
        int tm_mon;         // month 
        int tm_year;        // year 
        int tm_wday;        // day of the week 
        int tm_yday;        // day in the year 
        int tm_isdst;       // daylight saving time 
    };
    */

Thank you.

www.marearts.com

9/16/2022

python float variable Decimals format

refer to example code

print(f'{my_var:.1f}')
print(f'{res_dict}, total:{total_end - total_start:.3f} , ready:{end_ready - total_start:.3f}, inference:{end_inference - start_inference:.3f}') 

Thank you.

www.marearts.com

9/01/2022

Delete 100mb file in all committed history in git

Let's say the problem which is more than 100mb file is "shit.pt"

And you want to delete already committed in git history and you don't push and reverting.

Then this is save you

> git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch SHIT.bin' --prune-empty --tag-name-filter cat -- --all

Change SHIT.bin file name as your problem guy.

And add .gitignore and commit & push.

Thank you.

www.marearts.com

8/26/2022

finding all specific extension file in all subfolder from starting directory

refer to example code

import os
from glob import glob

start_dir = '/start/folder' #os.getcwd()
pattern   = "*.jpg" #file extension which you want to find

for dir,_,_ in os.walk(start_dir):
    print( glob(os.path.join(dir,pattern)) )

Thank you.

www.marearts.com

8/25/2022

use easy attribute class using "argparse", "ArgumentParser"

see the example code

import argparse

#define as default
paser = argparse.ArgumentParser()
args = paser.parse_args("")

#assign values
args.cuda = False
args.device = 'cpu'

#print
print(args)
print(args.cuda)

Thank you.

www.marearts.com

8/11/2022

ImportError: IProgress not found

pip install ipywidgets

5/30/2022

find optimal clustering number using silhouette evaluation

To Find optimal clustering number using silhouette metrics

It evaluate clustering resulting in every k number of KMean algorithm.

And show it as figure.

Lager value is better result.

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples
import numpy as np
import matplotlib.pyplot as plt
silhouette_vals = []
sk,ek = 2,20
for i in range(sk, ek):
    kmeans_plus = KMeans(n_clusters=i, init='k-means++')
    pred = kmeans_plus.fit_predict(cluster_df)
    silhouette_vals.append(np.mean(silhouette_samples(cluster_df, pred, metric='euclidean')))
plt.plot(range(sk, ek), silhouette_vals, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Silhouette')
plt.show()

For example here, 20 k is best clustering result.

Thank you.

5/24/2022

t-SNE visualisation example code in Python

Refer to code

from sklearn.manifold import TSNE

from keras.datasets import mnist

from sklearn.datasets import load_iris

from numpy import reshape

import seaborn as sns

import pandas as pd  

iris = load_iris()

x = iris.data

y = iris.target 

# from sklearn.utils import shuffle

# x, y = shuffle(x, y)

tsne = TSNE(n_components=2, verbose=1, random_state=123)

z = tsne.fit_transform(x) 

df = pd.DataFrame()

df["y"] = y

df["comp-1"] = z[:,0]

df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),

                palette=sns.color_palette("hls", 3),

                data=df).set(title="Iris data T-SNE projection") 

output

5/19/2022

check ubuntu version on terminal

lsb_release -a

5/14/2022

pathlib, path, pathlib.PosixPath,

Make path using pathlib.

refer to code

from pathlib import Path
image_dir = 'dataset/images'
images = '1.png'
print(str(Path(image_dir) / images))
print( type(Path(image_dir) / images ))
# dataset/images/1.png
# <class 'pathlib.PosixPath'>

www.marearts.com

yaml to dict, adding argparse to dict (easydict, yaml)

Simple code to adapt and know how to read yaml and convert it to dict.

And one more thing is add argparse param to dict which is made from yaml.

We use easydict for this.

Refer to below code, then you would understand at a glance.

from easydict import EasyDict
import yaml

def load_setting(setting):
    with open(setting, 'r') as f:
        cfg = yaml.load(f, Loader=yaml.FullLoader)
    return EasyDict(cfg)

#----------------------------
cfg = load_setting('test.yaml')
print(cfg)
#{'V1': 'abc', 'V2': {'sub': [1, 2, 3]}}
#----------------------------

#----------------------------
import argparse
paser = argparse.ArgumentParser()
args = paser.parse_args("")
args.batch_size=10
args.epoch=10
#----------------------------

cfg.update(vars(args))
print(cfg, type(cfg))
#{'V1': 'abc', 'V2': {'sub': [1, 2, 3]}, 'batch_size': 10, 'epoch': 10} <class 'easydict.EasyDict'>
#----------------------------

Thank you.

www.marearts.com

5/13/2022

convert simple transformer ner model to onnx

!python -m transformers.onnx --model=./checkpoint-21-epoch-11 --feature=token-classification onnx/

tokens to word, transformer

Refer to code to figure it out

how tokens consisted for a word.

Code show you tokens list for a word.

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("roberta-base")

example = "This is a tokenization example"

print('input sentence: ', example)
print('---')
print('tokens :')
print( tokenizer.encode(example, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) )
print('---')
print('word and tokens :')
print({x : tokenizer.encode(x, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) for x in example.split()})
print('---')
idx = 1
enc =[tokenizer.encode(x, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) for x in example.split()]
desired_output = []
for token in enc:
    tokenoutput = []
    for ids in token:
        tokenoutput.append(idx)
        idx +=1
    desired_output.append(tokenoutput)

print('tokens in grouped list')
print(desired_output)
print('---')

input sentence:  This is a tokenization example
---
tokens :
[713, 16, 10, 19233, 1938, 1246]
---
word and tokens :
{'This': [713], 'is': [354], 'a': [102], 'tokenization': [46657, 1938], 'example': [46781]}
---
tokens in grouped list
[[1], [2], [3], [4, 5], [6]]
---

Thank you.

www.marearts.com

5/11/2022

python dict order shuffle

import random
d = {'a':1, 'b':2, 'c':3, 'd':4}
l = list(d.items())
random.shuffle(l)
d = dict(l)
print(d)

{'a': 1, 'c': 3, 'b': 2, 'd': 4}

5/09/2022

BERT Tokenizer, string to token, token to string

BERT Tokenizer token understanding examples

text = "I am e/mail"
# text = "I am a e-mail"
tokens = tokenizer.tokenize(text)
print(f'Tokens: {tokens}')
print(f'Tokens length: {len(tokens)}')
encoding = tokenizer.encode(text)
print(f'Encoding: {encoding}')
print(f'Encoding length: {len(encoding)}')
tok_text = tokenizer.convert_tokens_to_string(tokens) 
print(f'token to string: {tok_text}')

output:

Tokens: ['I', 'Ġam', 'Ġe', '/', 'mail']
Tokens length: 5
Encoding: [0, 100, 524, 364, 73, 6380, 2]
Encoding length: 7
token to string: I am e/mail

--

Thank you.

www.marearts.com

5/04/2022

Python Convert List into a space-separated string

refer to code

lst = ['I', 'am', 'a', 'humen']
strlist = ' '.join(lst)
print(strlist, type(strlist))

I am a humen <class 'str'>

Thank you.

4/29/2022

simple example for EDA(Exploratory Data Analysis) using Tensorflow data validation

refer to this page for more detail

: https://www.tensorflow.org/tfx/data_validation/get_started

!pip install tensorflow_data_validation

import tensorflow_data_validation as tfdv
stats = tfdv.generate_statistics_from_tfrecord(data_location=path)
tfdv.visualize_statistics(stats)

Thank you.

4/25/2022

ModuleNotFoundError: No module named 'onnxruntime'

install onnxruntime

> pip install onnxruntime

4/22/2022

Measuring processing time python

Measure processing time

#metho #1
import time
start = time.time()
print("hello")
end = time.time()
print(end - start)


#method #2
from timeit import default_timer as timer
start = timer()
# ...
end = timer()
print(end - start) # Time in seconds, e.g. 5.38091952400282

Thank you.

www.marearts.com

4/19/2022

Object of type float32 is not JSON serializable

Converting float to avoid error, refer to code:

import json

face_dict = {'x1': 240.54083251953125, 'y1': 470.02429199218744, 'x2': 479.535400390625, 'y2': 655.3250732421875, 'LeyeX': 382.76947021484375, 'LeyeY': 538.7545166015624, 'ReyeX': 383.48541259765625, 'ReyeY': 621.1448364257811, 'NoseX': 332.7269287109375, 'NoseY': 590.6889648437499, 'LlipsX': 300.84881591796875, 'LlipsY': 542.9485473632811, 'RlipsX': 301.3223876953125, 'RlipsY': 615.5052490234374, 'conf': 0.9999992}

data_convert = {k:float(v) for k,v in face_dict.items()}
with open('./data_convert.json', 'w') as fp:
    json.dump(data_convert, fp, indent=5)

Another solution is:

with open('./data_convert.json', 'w') as fp:

    json.dump(str(face_dict), fp, indent=5)

But in this case, json is saved as string.

Thank you.

www.marearts.com

rectangle, box face -> mosaic, pixelate

refer to mosaic function

def mosaic(img, rect, size):
    (x1, y1, x2, y2) = rect
    w = x2 - x1
    h = y2 - y1
    i_rect = img[y1:y2, x1:x2]
    i_small = cv2.resize(i_rect, ( size, size))
    i_mos = cv2.resize(i_small, (w, h), interpolation=cv2.INTER_AREA)
    img2 = img.copy()
    img2[y1:y2, x1:x2] = i_mos
    return img2

#.... detect face
for face_dict in faces_dict:
    x1,y1 = ( int(face_dict['x1']), int(face_dict['y1']))
    x2,y2 = ( int(face_dict['x2']), int(face_dict['y2']))

    # image = anonymize_face_pixelate(image[y1:y2, x1:x2, :] , blocks=3)
    image = mosaic(image, (x1, y1, x2, y2), 10 ) 
#.... face mosaic

Ex) result

Thank you.

www.marearts.com

Remove duplicated dict element in list, remove same dict element in tow list

refer to code:

a=[{'a':1, 'b':3}, {'a':2, 'b':4}]
b=[{'a':3, 'b':3}, {'a':2, 'b':4}]
a.extend(b)
[dict(t) for t in {tuple(d.items()) for d in a}]

>> [{'a': 1, 'b': 3}, {'a': 2, 'b': 4}, {'a': 3, 'b': 3}]

Thank you.

www.marearts.com

python calculate intersect of union area between two box

refer to code:

def IoU(box1, box2):
    # box = (x1, y1, x2, y2)
    box1_area = (box1[2] - box1[0] + 1) * (box1[3] - box1[1] + 1)
    box2_area = (box2[2] - box2[0] + 1) * (box2[3] - box2[1] + 1)

    # obtain x1, y1, x2, y2 of the intersection
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])
    
    x11, y11, x12, y12 = box1
    x21, y21, x22, y22 = box2
    
    if x21<x11 and y21<y11 and x22>x12 and y22>y12:
        return 1.0
    
    if x21>x11 and y21>y11 and x22<x12 and y22<y12:
        return 1.0
    
    # compute the width and height of the intersection
    w = max(0, x2 - x1 + 1)
    h = max(0, y2 - y1 + 1)

    inter = w * h
    iou = inter / (box1_area + box2_area - inter)
    return iou

Thank you

www.marearts.com

4/17/2022

python yaml to dict

refer to code

import yaml
with open('hparams.yaml', 'r') as stream:
    try:
        parsed_yaml=yaml.safe_load(stream)
        print(parsed_yaml, type(parsed_yaml))
    except yaml.YAMLError as exc:
        print(exc)

> example output

{'batch_size': 1000, 'data_path': ['../npy/x_train_coin_eth.npy', '../npy/y_train_coin_eth.npy', '../npy/x_val_coin_eth.npy', '../npy/y_val_coin_eth.npy', '../npy/x_test_coin_eth.npy', '../npy/y_test_coin_eth.npy'], 'encoder_hidden_dims': [4, 2], 'input_dim': 5, 'input_seq': 128, 'learning_rate': 0.0001, 'num_LSTM': 2, 'num_layers': 1} <class 'dict'>

Thank you.
www.marearts.com

'your model' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

I tired to save sub model in seq-to-seq model which is encoder part.

What I used for save and load is like follow code and I failed with error like title.

* Failed case

torch.save(model.lstm_auto_model.lstm_encoder.lstms, 'idx-78-encoder_lstm_encoder_lstms.mare')

torch.load(lstm_abyss_model.lstm_encoder, 'idx-78-encoder_lstm_lstm_encoder.mare')

* error message

AttributeError: 'LSTM' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

My solution is to use 'state_dict()' to save model.

Refer to bellow code which was solution for me.

torch.save(model.lstm_auto_model.lstm_encoder.lstms.state_dict(), 'idx-78-encoder_lstm_encoder_lstms_dict.mare')

lstm_abyss_model.lstm_encoder.lstms.load_state_dict(torch.load('idx-78-encoder_lstm_encoder_lstms_dict.mare'))

Thank you.

www.marearts.com

Pages

12/15/2022

12/13/2022

12/12/2022

11/07/2022

10/28/2022

10/20/2022

10/17/2022

10/14/2022

10/13/2022

10/07/2022

10/02/2022

9/24/2022

9/23/2022

9/16/2022

9/01/2022

8/26/2022

8/25/2022

8/11/2022

5/30/2022

5/24/2022

5/19/2022

5/14/2022

5/13/2022

5/11/2022

5/09/2022

5/04/2022

4/29/2022

4/25/2022

4/22/2022

4/19/2022

4/17/2022

Labels

Archive