The list_objects_v2 function returns up to 1000 objects by default. To read all the contents in the bucket, you can use pagination.

 refer to code:

You can modify '.json' for you case.


import boto3

def get_origin_fn_list(ORIGIN_DATA_S3, ORIGIN_DATA_S3_prefix):
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
origin_path = {}

for response in paginator.paginate(Bucket=ORIGIN_DATA_S3, Prefix=ORIGIN_DATA_S3_prefix):
for obj in response['Contents']:
if obj['Key'][-4:] == '.json':
path = obj['Key']
uid = path.split('/')[-2]
origin_path[uid] = path

print(f"get kv.json list: {len(origin_path)}/{sum(1 for _ in paginator.paginate(Bucket=ORIGIN_DATA_S3, Prefix=ORIGIN_DATA_S3_prefix))}")
return origin_path


Thank you.



No comments:

Post a Comment