There are same eeg_id in data, but we can split it based on same id to train, val using GroupShuffleSplit.
Refer to code:
.
import pandas as pd
from sklearn.model_selection import GroupShuffleSplit
# Load your dataset
train = pd.read_csv('./train.csv')
# Display the shape of the dataset
print("Dataset shape:", train.shape)
# Count unique eeg_id values
unique_eeg_id_count = train['eeg_id'].nunique()
print("Unique eeg_id count:", unique_eeg_id_count)
# Initialize the GroupShuffleSplit
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
# Split the dataset based on the 'eeg_id' to ensure group cohesion
for train_idx, val_idx in gss.split(train, groups=train['eeg_id']):
train_set = train.iloc[train_idx]
val_set = train.iloc[val_idx]
# Now, train_set and val_set are split according to unique eeg_ids,
# ensuring that all records of a single eeg_id are in the same subset
print("Training set shape:", train_set.shape)
print("Validation set shape:", val_set.shape)
..
Thank you.
ππ»♂️
No comments:
Post a Comment