4/16/2023

example source code of python for converting numpy ndarray to pandas dataframe

 refer to code:



.

import numpy as np
import pandas as pd

# Create a numpy ndarray
array = np.random.rand(5, 3)
print('array: \n', array)

# Convert the numpy ndarray to a pandas DataFrame
df = pd.DataFrame(array, columns=['Column1', 'Column2', 'Column3'])

# Print the DataFrame
print('df: \n',df)

..



Thank you.

www.marearts.com

🙇🏻‍♂️

4/14/2023

Get YouTube url list from YouTube playlist url.

refer to code: 

.

from pytube import Playlist

# Replace with your playlist URL
playlist_url = 'https://www.youtube.com/playlist?list=yourlist'

playlist = Playlist(playlist_url)

# Fetch video URLs
video_urls = playlist.video_urls

# Print video URLs
for url in video_urls:
print(url)

..


www.marearts.com

🙇🏻‍♂️

DownLoad Youtube video - I fall in love too easily



Download app

🗓️ Version 2.0 - 2023-04-29

🙅🏽 Don't worry there is no virus!! It's very clean code.

📦 Download link -> By me a coffee : https://www.buymeacoffee.com/trurg28/e/131820



refer to code 

.

#pip install pytube
from pytube import YouTube

# Replace the URL below with the URL of the video you want to download
video_url = 'https://www.youtube.com/watch?v=YOUR_VIDEO_ID'

# Creating a YouTube object
yt = YouTube(video_url)

# Getting the highest resolution video stream
video = yt.streams.get_highest_resolution()

# Downloading the video
video.download()

print("Video downloaded successfully.")

..


you can fine source code here:

https://study.marearts.com/2023/09/download-youtube-video-url-to-local.html


www.marearts.com

🙇🏻‍♂️

4/13/2023

Create the output folder if it doesn't exist, python example

 refer to code:


.

# Create the output folder if it doesn't exist
output_folder = os.path.dirname(ioutput_file)
os.makedirs(output_folder, exist_ok=True)

..


Thank you.

🙇🏻‍♂️

Count object(file) amount in certain s3 folder. python and cli example

 refer to code.

.

import boto3

def count_files_in_bucket(bucket_name, prefix=''):
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
file_count = 0

for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix):
for obj in page['Contents']:
if not obj['Key'].endswith('/'):
file_count += 1
return file_count


bucket_name = 'your_bucket_name'
prefix = 'sub_folder_name'
file_count = count_files_in_bucket(bucket_name, prefix)
print(f'There are {file_count} files in the "{prefix}" folder of the "{bucket_name}" bucket.')

..


🙇🏻‍♂️

Thank you.

www.marearts.com



4/12/2023

The long-tail problem in an unbalanced dataset

The long-tail problem in an unbalanced dataset is a situation where a few classes have a large number of samples, while a majority of classes have few samples. This can lead to biased models that perform poorly on underrepresented classes. To address this issue, you can use various techniques, including:

  1. Resampling methods:
    a. Oversampling: Increase the number of instances in the underrepresented classes by creating copies or generating synthetic samples.
    • Random oversampling: Duplicate random instances from the minority classes.
    • Synthetic Minority Over-sampling Technique (SMOTE): Generate synthetic samples by interpolating between instances in the minority class.
    • Adaptive Synthetic (ADASYN): Similar to SMOTE, but with a focus on generating samples for difficult-to-classify instances.
    • b. Undersampling: Reduce the number of instances in the overrepresented classes.
    • Random undersampling: Randomly remove instances from the majority class.
    • Tomek links: Identify and remove majority class instances that are close to minority class instances.
    • Neighborhood Cleaning Rule (NCR): Remove majority class instances that are misclassified by their nearest neighbors.
  2. Cost-sensitive learning: Assign higher misclassification costs to underrepresented classes during the training process, encouraging the model to be more sensitive to these classes.
  3. Ensemble methods: Combine multiple models to improve classification performance.
    a. Balanced Random Forest: A variation of the Random Forest algorithm that balances the class distribution by either undersampling the majority class or oversampling the minority class in each tree.
    b. EasyEnsemble: Train an ensemble of classifiers, each using a random under-sampling of the majority class.
    c. RUSBoost: An adaptation of the boosting algorithm that incorporates random under-sampling of the majority class during the training process.
  4. Transfer learning: Pre-train a model on a balanced dataset or a dataset from a related domain, then fine-tune it on the imbalanced dataset.
  5. Evaluation metrics: Use appropriate evaluation metrics such as precision, recall, F1-score, or the area under the precision-recall curve (AUPRC) to measure the model's performance on the minority class. This helps ensure that the model's performance is not skewed by the imbalanced class distribution.

Remember to experiment with different techniques to find the best approach for your specific dataset and problem.


www.marearts.com

🙇🏻‍♂️

4/09/2023

Combine Two Videos Side by Side with OpenCV python

 refer to code:

.

import cv2

# Open the two video files
video1 = cv2.VideoCapture('video1.mp4')
video2 = cv2.VideoCapture('video2.mp4')

# Get video properties
width1 = int(video1.get(cv2.CAP_PROP_FRAME_WIDTH))
height1 = int(video1.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps1 = video1.get(cv2.CAP_PROP_FPS)

width2 = int(video2.get(cv2.CAP_PROP_FRAME_WIDTH))
height2 = int(video2.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps2 = video2.get(cv2.CAP_PROP_FPS)

# Check if videos have the same FPS and height
assert fps1 == fps2, "Videos should have the same FPS"
assert height1 == height2, "Videos should have the same height"

# Create a VideoWriter object to save the combined video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output.mp4', fourcc, fps1, (width1 + width2, height1))

while video1.isOpened() and video2.isOpened():
ret1, frame1 = video1.read()
ret2, frame2 = video2.read()

if not ret1 or not ret2:
break

# Concatenate the frames side by side
combined_frame = cv2.hconcat([frame1, frame2])

# Write the combined frame to the output video
out.write(combined_frame)

# Display the combined frame
cv2.imshow('Combined Video', combined_frame)

# Press 'q' to stop the process and close the window
if cv2.waitKey(1) & 0xFF == ord('q'):
break

# Release the video files and the output video
video1.release()
video2.release()
out.release()

cv2.destroyAllWindows()

..


Thank you.

www.marearts.com

🙇🏻‍♂️