How to Get the Size of an S3 Bucket using Boto3 Python
Dear Reader, I hope you are doing well. In today’s post, you will learn to get the total size of an S3 bucket in various ways using boto3. A few days ago, I shared a tutorial to find out the size of an S3 bucket using the AWS console. Today, we will see how to automate things using boto3.
So are you ready?
Don’t want to miss any posts from us? join us on our Facebook group, and follow us on Facebook, Twitter, LinkedIn, and Instagram. You can also subscribe to our newsletter below to not miss any updates from us.
Prerequisite
- An active AWS account: See how to set up your free tier AWS account in the right way.
- Python 3 installed on your system
- Boto3 Installed on Your System
- Access key/ Secret Key
Connecting to S3 from Boto3
Before you try to get the size of an S3 bucket using boto3, you need to set up the credentials that it will use.
The easiest way to set it up on your system is using aws configure command-
aws configure
Enter your access key, secret key and region you want to work with one by one and you should be ready to write your first Python program using boto3.
Ways to Get the Size of an S3 Bucket using Boto3 Python
There are many different ways that can be used to calculate the size of an S3 bucket using boto3. We are discussing two prominent ones here.
- Get Bucket Size using CloudWatch Matrix
- Get S3 Bucket Size without CloudWatch Matrix
1. Get the size of an S3 bucket using CloudWatch
Permission: You need to have permission to access CloudWatch and retrieve metrics for the specified S3 bucket.
To find the size of an S3 bucket using CloudWatch and Boto3, you can utilize the CloudWatch metrics for S3 bucket storage. Specifically, you can use the “BucketSizeBytes” metric to retrieve the size of the bucket.
Ideally, it gets updated every 24 hours, so unless you need the real-time bucket size information, this should suffice your need without making a hole in your pocket.
CloudWatch client provides a method get_metric_statistics() method that you can use to get the BucketSizeBytes matrix. Then parse the response to get the average size using size_in_bytes = response[‘Datapoints’][0][‘Average’]. Then you can convert it into a human-readable format as per your need.
Here is a complete example of getting S3 bucket size using the CloudWatch matrix from boto3:
import boto3
import datetime
cloudwatch_client = boto3.client('cloudwatch')
def calculate_bucket_size(bucket_name):
print('Start Calculating Bucket Size using CloudWatch Matrix')
# Get the BucketSizeBytes Matrix from CloudWatch
response = cloudwatch_client.get_metric_statistics(
Namespace='AWS/S3',
MetricName='BucketSizeBytes',
Dimensions=[
{
'Name': 'BucketName',
'Value': bucket_name
},
{
'Name': 'StorageType',
'Value': 'StandardStorage'
}
],
StartTime=datetime.datetime.utcnow() - datetime.timedelta(days=2),
EndTime=datetime.datetime.utcnow(),
Statistics=['Average'],
Period=86400
)
# Extract the average size from the response
if 'Datapoints' in response and len(response['Datapoints']) > 0:
size_in_bytes = response['Datapoints'][0]['Average']
# Let's convert the size to a human-readable format
size_in_gb = size_in_bytes / (1024 ** 3)
size_in_mb = size_in_bytes / (1024 ** 2)
size_in_kb = size_in_bytes / 1024
print(f"Bucket Size in Bytes: {size_in_bytes} bytes")
print(f"Bucket Size in GB: {size_in_gb:.2f} GB")
print(f"Bucket Size in MB: {size_in_mb:.2f} MB")
print(f"Bucket Size in KB: {size_in_kb:.2f} KB")
else:
print("No data available for the bucket size.")
calculate_bucket_size('techtalk-with-preeti')
Important Note: Before using the above example, make sure to replace ‘bucket_name‘ with the name of your S3 bucket. The StartTime
and EndTime
parameters are set to fetch data from the past day however sometimes when you don’t get a response back, pass 2 days it will return the result.
2. Get S3 bucket size without using CloudWatch
The Cloudwatch matrix doesn’t get updated in real time. Therefore, if you need to know the size accurately, all the time you can’t really use it.
However, you can still estimate the size of an S3 bucket by summing up the sizes of all the objects in the bucket.
Here’s an updated code snippet using Boto3 to calculate the size of an S3 bucket by iterating over its objects:
import boto3
s3 = boto3.resource('s3')
s3_bucket = s3.Bucket('techtalk-with-preeti')
size_in_bytes = 0;
total_count = 0;
for key in s3_bucket.objects.all():
total_count += 1
size_in_bytes += key.size
# Let's convert the size to a human-readable format
size_in_gb = size_in_bytes / (1024 ** 3)
size_in_mb = size_in_bytes / (1024 ** 2)
size_in_kb = size_in_bytes / 1024
print(f"Bucket Size in Bytes: {size_in_bytes} bytes")
print(f"Bucket Size in GB: {size_in_gb:.2f} GB")
print(f"Bucket Size in MB: {size_in_mb:.2f} MB")
print(f"Bucket Size in KB: {size_in_kb:.2f} KB")
Which one to Use?
It all depends on your requirements. As I said if you just want to know the overall bucket size, you can use the Cloudwatch matrix method.
However, when you need real-time size information, use the s3_client.objects.all() way. Moreover, this under-the-hood uses ListObject and will be chargeable as per standard rates. In case you have a large bucket, you might end up paying a lot more than intended. So be cautious before using it.
Pro tip: Consider using the AWS S3 inventory feature to generate inventory files every day. You can parse that file to get all the info you need.
Conclusion
In this post, we learnt how to get the size of an S3 bucket using boto3 Python. We learnt two different ways in which you can calculate the total size of an s3 bucket.
Additionally, I provided a tip by the end of the tutorial to use the setup S3 inventory feature and parse the inventory file to calculate the total size in a cost-effective way.
Were you able to get the total size of your bucket from boto3 using the above example? Let me know in the comment section. Also if you prefer any other way, do let us know and we would be happy to include that as well.
Enjoyed the content?
Subscribe to our newsletter below to get awesome AWS learning materials delivered straight to your inbox.
Don’t forget to motivate us-