7/29/2020

ROC & AUC example code in face detector model case



..

#https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html
import numpy as np
from sklearn import metrics
import matplotlib.pyplot as plt

#model #1
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
scores = np.array([0.64, 0.47, 0.46, 0.77, 0.72, 0.9, 0.85, 0.7, 0.87, 0.92, 0.89, 0.93, 0.85, 0.81, 0.88, 0.48, 0.1, 0.35, 0.68, 0.47])
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
roc_auc = metrics.auc(fpr, tpr)

# plot
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

..


7/28/2020

Example model metrics using sklearn in face detector case


..

from sklearn.metrics import classification_report
#model 1
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
y_pred = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
target_names = ['Non Face', 'Face']
print(classification_report(y_true, y_pred, target_names=target_names, digits=3))
..



..

#model 2
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
y_pred = [0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
target_names = ['Non Face', 'Face']
print(classification_report(y_true, y_pred, target_names=target_names, digits=3))
..


7/07/2020

extract year, month, day from file on Ubuntu, python example


...
import os, time
date_created_obj = time.localtime(os.path.getctime(full_path))
print('Year: {:4d}'.format(date_created_obj.tm_year)) # Year: 2020
print('Month: {:2d}'.format(date_created_obj.tm_mon)) # Month: 2
print('Day: {:2d}'.format(date_created_obj.tm_mday)) # Day: 10

...


7/06/2020

how to merge two csr_matrix, example python source code

let's see the code.

..
from scipy.sparse import csr_matrix
import numpy as np

#first matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 1, 1, 1, 1, 1])
mtx = csr_matrix((data, (row, col)), shape=(3, 3))

#second matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 1, 2, 0, 1, 2])
data = np.array([1, 1, 1, 1, 1, 1])
mtx2 = csr_matrix((data, (row, col)), shape=(3, 3))

#merge two matrix
mtx3 = merge_two_csr_mtx(mtx, mtx2)

#check
print('1st\n',mtx)
print('2nd\n',mtx2)
print('merge\n',mtx3)
..

result
1st
   (0, 0) 1
  (0, 2) 1
  (1, 2) 1
  (2, 0) 1
  (2, 1) 1
  (2, 2) 1
2nd
   (0, 0) 1
  (0, 1) 1
  (1, 2) 1
  (2, 0) 1
  (2, 1) 1
  (2, 2) 1
merge
   (0, 0) 2.0
  (0, 1) 1.0
  (0, 2) 1.0
  (1, 2) 2.0
  (2, 0) 2.0
  (2, 1) 2.0
  (2, 2) 2.0

How to convert a scipy csr_matrix back into lists of row, col and data?

refer to code


..
Define matrix & check values
from scipy.sparse import csr_matrix
import numpy as np
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 1, 2, 0, 1, 2])
data = np.array([1, 1, 1, 1, 1, 1])
mtx2 = csr_matrix((data, (row, col)), shape=(3, 3))
print(mtx2) #matrix print out
print(mtx2.toarray()) #print out by array

>
(0, 0) 1
  (0, 1) 1
  (1, 2) 1
  (2, 0) 1
  (2, 1) 1
  (2, 2) 1
>
[[1 1 0]
 [0 0 1]
 [1 1 1]]
..


...
get back the row, col and data value from matrix
c = mtx2.tocoo()
print(c.row)
print(c.col)
print(c.data)

>
[0 0 1 2 2 2]
[0 1 2 0 1 2]
[1 1 1 1 1 1]
...

6/09/2020

sentence embedding, sentence to vector using bert

refer to source code

.
#pip install -U sentence-transformers
#https://github.com/UKPLab/sentence-transformers
from sentence_transformers import SentenceTransformer, LoggingHandler

# Load Sentence model (based on BERT) from URL
model = SentenceTransformer('bert-base-nli-mean-tokens')

# Embed a list of sentences
sentences = ['This framework generates embeddings for each input sentence',
'Sentences are passed as a list of string.',
'The quick brown fox jumps over the lazy dog.']
sentence_embeddings = model.encode(sentences)

# The result is a list of sentence embeddings as numpy arrays
for sentence, embedding in zip(sentences, sentence_embeddings):
print("Sentence:", sentence)
print("Embedding:", embedding.shape, type(embedding))
print("")
.

result is like this:
Sentence: This framework generates embeddings for each input sentence
Embedding: (768,) <class 'numpy.ndarray'>

Sentence: Sentences are passed as a list of string.
Embedding: (768,) <class 'numpy.ndarray'>

Sentence: The quick brown fox jumps over the lazy dog.
Embedding: (768,) <class 'numpy.ndarray'>

5/25/2020

install poppler in ubuntu

Try to this command:

sudo apt-get update -y
sudo apt-get install -y poppler-utils

😁

5/19/2020

Ways to sort list of dictionaries by values in Python – Using lambda function


.
#example list
dict_list = [{ "idx":1, "value1":32.44, "value2":123.2}, { "idx":2, "value1":32.414, "value2":133.2}, { "idx":3, "value1":32.244, "value2":113.2}]

#sort by ascending order
sorted_dict_list = sorted(dict_list, key = lambda i: i['value1'])
#sort by descending order
r_sorted_dict_list = sorted(dict_list, key = lambda i: i['value1'],reverse=True)

#show result
print(sorted_dict_list)
# [{'idx': 3, 'value1': 32.244, 'value2': 113.2}, {'idx': 2, 'value1': 32.414, 'value2': 133.2}, {'idx': 1, 'value1': 32.44, 'value2': 123.2}]

print(r_sorted_dict_list)
# [{'idx': 1, 'value1': 32.44, 'value2': 123.2}, {'idx': 2, 'value1': 32.414, 'value2': 133.2}, {'idx': 3, 'value1': 32.244, 'value2': 113.2}]
.


5/15/2020

multi-thread example python source code

The code generate 10 multi threads for running single_function.
If you have look the pid in result, thread is finished by quickly proceeded.

..
import queue
from concurrent.futures import ThreadPoolExecutor

#function for thread
def single_function(input, pid, out_queue):
total = 0
for i in range(0,input):
for j in range(0, input):
for k in range(0, input):
total = total + 1

out_queue.put( {'index':pid, 'result':total })
#run thread
my_queue = queue.Queue()
with ThreadPoolExecutor(max_workers=10) as executor:
for pid in range(0, 10):
executor.submit(single_function, 100, pid, my_queue)
#get result of each thread
result = {}
while not my_queue.empty():
get = my_queue.get()
print(get)

#finish all thread
..

result

{'index': 1, 'result': 1000000}
{'index': 3, 'result': 1000000}
{'index': 2, 'result': 1000000}
{'index': 0, 'result': 1000000}
{'index': 5, 'result': 1000000}
{'index': 4, 'result': 1000000}
{'index': 8, 'result': 1000000}
{'index': 6, 'result': 1000000}
{'index': 9, 'result': 1000000}
{'index': 7, 'result': 1000000}

5/02/2020

get image rect list from pdf

extract all image rect list from pdf using pymupdf
look at the sample code

..

#pip install PyMuPDF
#document : https://pymupdf.readthedocs.io/en/latest/

#pip install opencv-python
#github : https://github.com/skvark/opencv-python

import fitz

img_bbox = []
doc1 =fitz.open('test.pdf')
page1 = doc1[0] #first page

d = page1.getText("dict")
blocks = d["blocks"]
imgblocks = [b for b in blocks if b["type"] == 1]
for v in imgblocks:
[x1, y1, x2, y2] = v['bbox']
#print(x1, y1, x2, y2)
img_bbox.append({'left':int(x1), 'top':int(y1), 'right':int(x2), 'bottom':int(y2)})
..

4/23/2020

remove all image from pdf file, python source code

input
output


PyMuPDF is needed
pip install PyMuPDF
..

def remove_img_on_pdf(idoc, page):
#image list
img_list = idoc.getPageImageList(page)
con_list = idoc[page]._getContents()

# xref 274 is the only /Contents object of the page (could be
for i in con_list:
c = idoc._getXrefStream(i) # read the stream source
#print(c)
if c != None:
for v in img_list:
arr = bytes(v[7], 'utf-8')
r = c.find(arr) # try find the image display command
if r != -1:
cnew = c.replace(arr, b"")
idoc._updateStream(i, cnew)
c = idoc._getXrefStream(i)
return idoc


doc=fitz.open('example.PDF')
rdoc = remove_img_on_pdf(doc, 0) #first page
rdoc.save('no_img_example.PDF')
..

reference : https://github.com/pymupdf/PyMuPDF/issues/338



4/16/2020

Python OpenCV Image to byte string for json transfer

code :
import cv2
import base64
import json
import numpy as np

######################################################
#read image
img = cv2.imread('./code_backup/test_img.jpg')
#cv2 to string
image_string = cv2.imencode('.jpg', img)[1]
image_string = base64.b64encode(image_string).decode()
#make string image dict
dict = {'img':image_string}
#save dict to json file
with open('./code_backup/cv2string.json', 'w') as fp:
json.dump(dict, fp, indent=5)
######################################################


######################################################
#read json
response = json.loads(open('./code_backup/cv2string.json', 'r').read())
#get image string
string = response['img']
#convert string to image
jpg_original = base64.b64decode(string)
jpg_as_np = np.frombuffer(jpg_original, dtype=np.uint8)
img = cv2.imdecode(jpg_as_np, flags=1)
#show image
cv2.imshow('show image', img)
cv2.waitKey(0)
######################################################
..

this is input image

this is summarised json file
{ "img": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYFBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/wAARCAFoAeADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2BP8Ag34/4JIE4P7I4/8AC617/wCTqsJ/wb4/8EiTx/wyR/5fWv8A/wAnV9nPH2x+FSwx14/t6vc+i+r4f+Q+Mk/4N7P+CQ+P+TRx/wC ........."
}

4/03/2020

Example python code for : Download s3 object as opencv image in memory and upload too

Just see the code
It's not difficult.

...

...
import cv2
import numpy as np
...

def lambda_handler(event, context):
# TODO implement
bucket_name = event['Records'][0]['s3']['bucket']['name']
s3_path = event['Records'][0]['s3']['object']['key']
#download object
obj = s3_client.get_object(Bucket=bucket_name, Key=s3_path)
#obj to cv2
nparr = np.frombuffer(obj['Body'].read(), np.uint8)
img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
#simple image processing
reimg = cv2.resize(img, (100,100) )
#cv2 to string
image_string = cv2.imencode('.png', reimg)[1].tostring()
#upload
s3_client.put_object(Bucket='thum-prj-output', Key = s3_path, Body=image_string)
...

...

4/02/2020

PDF to OpenCV as page by page using PyMuPDF library (python example code)

Just see the below example code 😊

pip install PyMuPDF
document : https://pymupdf.readthedocs.io/en/latest/

I think this is better library than pypdf2 🤔
..

import fitz
import numpy as np
import cv2
fname = 'information-10-00248-v2'
doc = fitz.open(fname+'.pdf')

#split pages
for i, page in enumerate(doc.pages()):
print(i)
zoom = 1
mat = fitz.Matrix(zoom, zoom)
pix = page.getPixmap(matrix = mat)
imgData = pix.getImageData("png")
 
#save image from byte
f = open('./save_by_byte_{}_{}.png'.format(fname, i), 'wb')
f.write(imgData)
f.close()
 
#save image from opencv
nparr = np.frombuffer(imgData, np.uint8)
img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
print(img.shape)
cv2.imwrite('./save_by_opencv_{}_{}.png'.format(fname, i),img)

..

3/27/2020

Python : How to copy files from one location to another using shutil.copy()

Just check the example code

..

import shutil
# Copy file to another directory
newPath = shutil.copy('sample1.txt', '/home/varung/test')
print("Path of copied file : ", newPath)
#Path of copied file :  /home/varung/test/sample1.txt

#Copy a file with new name
newPath = shutil.copy('sample1.txt', '/home/varung/test/sample2.txt')
print("Path of copied file : ", newPath)
#Path of copied file :  /home/varung/test/sample2.txt

# Copy a symbolic link as a new link
newPath = shutil.copy('/home/varung/test/link.csv', '/home/varung/test/sample2.csv')
print("Path of copied file : ", newPath)
#Path of copied file :  /home/varung/test/sample2.csv

# Copy target file pointed by symbolic link
newPath = shutil.copy('/home/varung/test/link.csv', '/home/varung/test/newlink.csv', follow_symlinks=False)
print("Path of copied file : ", newPath)
#Path of copied file :  /home/varung/test/newlink.csv
..

How to delete a file or folder? (python code)

Use os or shut library
os.remove() removes a file.
os.rmdir() removes an empty directory.
shutil.rmtree() deletes a directory and all its contents.


EX)
import os
os.remove("/tmp/<file_name>.txt")
os.rmdir("/tmp/")

import shutil
shutil.rmtree("/tmp/")

3/23/2020

WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
...


simple solutin :
ssh-keygen -R <host>

Ex)
ssh-keygen -R 192.168.3.10
ssh-keygen -R ubuntu@ec2-30.30.30.eu-west-1.compute.amazonaws.com





3/22/2020

use docker without sudo

1.Create the docker group.
> sudo groupadd docker

2.Add your user to the docker group.
> sudo usermod -aG docker $USER

3.Log out and log back in so that your group membership is re-evaluated.
> newgrp docker 

4.Verify that you can run docker commands without sudo.
>docker run hello-world

3/21/2020

transfer local docker image to another machine


Transfer docker image A -> B
1. Make zip file on A 
docker save -o ./example.tar image_name

2. Transfer example.tar  A to B using scp, rsync, or in some way

3. Load image on B
docker load -i ./example.tar

install docker on ubuntu 18.04

Step 1: Update Software Repositories
sudo apt-get update

Step 2: Uninstall Old Versions of Docker
sudo apt-get remove docker docker-engine docker.io

Step 3: Install Docker
sudo apt install docker.io

Step 4: Start and Automate Docker
sudo systemctl start docker
sudo systemctl enable docker

Step 5 (Optional): Check Docker Version
docker --version

3/17/2020

get unique value from list (python source code)

..
import numpy as np

def unique(list1):
x = np.array(list1)
x = np.unique(x)
return list(x)

list1 = [10, 20, 10, 30, 40, 40]
list1 = unique(list1)
print(list1)
..
output
[10, 20, 30, 40]
..

3/11/2020

shuffle two related list python

method #1
..
a=['aaaaa','bbbb','cccc','dddd']
b=[1, 2, 3, 4]

from sklearn.utils import shuffle
list_1, list_2 = shuffle(a, b)
print(list_1, list_2)
#['cccc', 'aaaaa', 'dddd', 'bbbb'] [3, 1, 4, 2]
..

method #2
...
list1_shuf = []
list2_shuf = []
index_shuf = list(range(len(a)))
shuffle(index_shuf)
for i in index_shuf:
list1_shuf.append(a[i])
list2_shuf.append(b[i])

print(list1_shuf)
print(list2_shuf)
#['cccc', 'aaaaa', 'bbbb', 'dddd']
#[3, 1, 2, 4]
...



3/02/2020

Using Virtual Environments in Jupyter Notebook and Python

activate user virtualenv
>source yourenv/bin/activate

install ipykernel
>(yourenv) pip install --user ipykernel


Add you virtualenv to jupyter
>(yourenv) python -m ipykernel install --user --name=myenv

Then you can choose myenv to create new file.



key commands:
pip install --user ipykernel
python -m ipykernel install --user --name=myenv

3/01/2020

denied: Your Authorization Token has expired. Please run 'aws ecr get-login' to fetch a new one.

error:
denied: Your Authorization Token has expired. Please run 'aws ecr get-login' to fetch a new one.

this is worked for me
>eval $( aws ecr get-login --no-include-email --region eu-west-1 )