2/18/2024

GroupShuffleSplit, sklearn

 

There are same eeg_id in data, but we can split it based on same id to train, val using GroupShuffleSplit.

Refer to code:

.



import pandas as pd
from sklearn.model_selection import GroupShuffleSplit

# Load your dataset
train = pd.read_csv('./train.csv')

# Display the shape of the dataset
print("Dataset shape:", train.shape)

# Count unique eeg_id values
unique_eeg_id_count = train['eeg_id'].nunique()
print("Unique eeg_id count:", unique_eeg_id_count)

# Initialize the GroupShuffleSplit
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=42)

# Split the dataset based on the 'eeg_id' to ensure group cohesion
for train_idx, val_idx in gss.split(train, groups=train['eeg_id']):
train_set = train.iloc[train_idx]
val_set = train.iloc[val_idx]

# Now, train_set and val_set are split according to unique eeg_ids,
# ensuring that all records of a single eeg_id are in the same subset
print("Training set shape:", train_set.shape)
print("Validation set shape:", val_set.shape)

..

Thank you.

πŸ™‡πŸ»‍♂️

No comments:

Post a Comment