How to Get the Size of an S3 Bucket using Boto3 Python

How to Get the Size of an S3 Bucket using Boto3 Python

How to Get the Size of an S3 Bucket using Boto3 Python

Dear Reader, I hope you are doing well. In today’s post, you will learn to get the total size of an S3 bucket in various ways using boto3. A few days ago, I shared a tutorial to find out the size of an S3 bucket using the AWS console. Today, we will see how to automate things using boto3.

So are you ready?

Don’t want to miss any posts from us? join us on our Facebook group, and follow us on Facebook, Twitter, LinkedIn, and Instagram. You can also subscribe to our newsletter below to not miss any updates from us.

Prerequisite

  • An active AWS account: See how to set up your free tier AWS account in the right way.
  • Python 3 installed on your system
  • Boto3 Installed on Your System
  • Access key/ Secret Key

Connecting to S3 from Boto3

Before you try to get the size of an S3 bucket using boto3, you need to set up the credentials that it will use.

The easiest way to set it up on your system is using aws configure command-

aws configure

Enter your access key, secret key and region you want to work with one by one and you should be ready to write your first Python program using boto3.

Ways to Get the Size of an S3 Bucket using Boto3 Python

There are many different ways that can be used to calculate the size of an S3 bucket using boto3. We are discussing two prominent ones here.

  1. Get Bucket Size using CloudWatch Matrix
  2. Get S3 Bucket Size without CloudWatch Matrix

1. Get the size of an S3 bucket using CloudWatch

Permission: You need to have permission to access CloudWatch and retrieve metrics for the specified S3 bucket.

To find the size of an S3 bucket using CloudWatch and Boto3, you can utilize the CloudWatch metrics for S3 bucket storage. Specifically, you can use the “BucketSizeBytes” metric to retrieve the size of the bucket.

Ideally, it gets updated every 24 hours, so unless you need the real-time bucket size information, this should suffice your need without making a hole in your pocket.

CloudWatch client provides a method get_metric_statistics() method that you can use to get the BucketSizeBytes matrix. Then parse the response to get the average size using size_in_bytes = response[‘Datapoints’][0][‘Average’]. Then you can convert it into a human-readable format as per your need.

Here is a complete example of getting S3 bucket size using the CloudWatch matrix from boto3:

import boto3
import datetime

cloudwatch_client = boto3.client('cloudwatch')

def calculate_bucket_size(bucket_name):

    print('Start Calculating Bucket Size using CloudWatch Matrix')
    
    # Get the BucketSizeBytes Matrix from CloudWatch
    response = cloudwatch_client.get_metric_statistics(
        Namespace='AWS/S3',
        MetricName='BucketSizeBytes',
        Dimensions=[
            {
                'Name': 'BucketName',
                'Value': bucket_name
            },
            {
                'Name': 'StorageType',
                'Value': 'StandardStorage'
            }
        ],
        StartTime=datetime.datetime.utcnow() - datetime.timedelta(days=2),
        EndTime=datetime.datetime.utcnow(),
        Statistics=['Average'],
        Period=86400
    )
    
    # Extract the average size from the response
    if 'Datapoints' in response and len(response['Datapoints']) > 0:
        size_in_bytes = response['Datapoints'][0]['Average']
        
        # Let's convert the size to a human-readable format
        size_in_gb = size_in_bytes / (1024 ** 3)
        size_in_mb = size_in_bytes / (1024 ** 2)
        size_in_kb = size_in_bytes / 1024
        
        print(f"Bucket Size in Bytes: {size_in_bytes} bytes")
        print(f"Bucket Size in GB: {size_in_gb:.2f} GB")
        print(f"Bucket Size in MB: {size_in_mb:.2f} MB")
        print(f"Bucket Size in KB: {size_in_kb:.2f} KB")
    else:
        print("No data available for the bucket size.")


calculate_bucket_size('techtalk-with-preeti')

Important Note: Before using the above example, make sure to replace ‘bucket_name‘ with the name of your S3 bucket. The StartTime and EndTime parameters are set to fetch data from the past day however sometimes when you don’t get a response back, pass 2 days it will return the result.

2. Get S3 bucket size without using CloudWatch

The Cloudwatch matrix doesn’t get updated in real time. Therefore, if you need to know the size accurately, all the time you can’t really use it.

However, you can still estimate the size of an S3 bucket by summing up the sizes of all the objects in the bucket.

Here’s an updated code snippet using Boto3 to calculate the size of an S3 bucket by iterating over its objects:

import boto3

s3 = boto3.resource('s3')
s3_bucket = s3.Bucket('techtalk-with-preeti')

size_in_bytes = 0;
total_count = 0;

for key in s3_bucket.objects.all():
    total_count += 1
    size_in_bytes += key.size

    # Let's convert the size to a human-readable format
    size_in_gb = size_in_bytes / (1024 ** 3)
    size_in_mb = size_in_bytes / (1024 ** 2)
    size_in_kb = size_in_bytes / 1024
        
    print(f"Bucket Size in Bytes: {size_in_bytes} bytes")
    print(f"Bucket Size in GB: {size_in_gb:.2f} GB")
    print(f"Bucket Size in MB: {size_in_mb:.2f} MB")
    print(f"Bucket Size in KB: {size_in_kb:.2f} KB")

Which one to Use?

It all depends on your requirements. As I said if you just want to know the overall bucket size, you can use the Cloudwatch matrix method.

However, when you need real-time size information, use the s3_client.objects.all() way. Moreover, this under-the-hood uses ListObject and will be chargeable as per standard rates. In case you have a large bucket, you might end up paying a lot more than intended. So be cautious before using it.

Pro tip: Consider using the AWS S3 inventory feature to generate inventory files every day. You can parse that file to get all the info you need.

Conclusion

In this post, we learnt how to get the size of an S3 bucket using boto3 Python. We learnt two different ways in which you can calculate the total size of an s3 bucket.

Additionally, I provided a tip by the end of the tutorial to use the setup S3 inventory feature and parse the inventory file to calculate the total size in a cost-effective way.

Were you able to get the total size of your bucket from boto3 using the above example? Let me know in the comment section. Also if you prefer any other way, do let us know and we would be happy to include that as well.

Enjoyed the content?

Subscribe to our newsletter below to get awesome AWS learning materials delivered straight to your inbox.

Don’t forget to motivate us-

Suggested Read:

Leave a Reply

Your email address will not be published. Required fields are marked *