How to Export Object Metadata from AWS S3
S3 stores billions of objects for millions of organizations. Here are three ways to get your bucket metadata into a spreadsheet, from a single CLI command to automated inventory reports.
AWS provides S3 Inventory, a built-in feature that automatically generates CSV or Parquet reports of all objects in a bucket with their metadata. For one-time or ad-hoc exports, the AWS CLI gives you instant results. For custom exports with full programmatic control, boto3 (Python) is the standard tool. All three methods require appropriate IAM permissions (at minimum s3:ListBucket to list objects and their metadata).
AWS CLI
The AWS CLI is pre-installed on most developer machines and all AWS environments. A single command lists every object in a bucket with key metadata. Pipe the output throughjq or a short Python script to get a CSV.
aws configure. You need s3:ListBucket permission on the target bucket.# List all objects in a bucket as JSON
aws s3api list-objects-v2 \
--bucket your-bucket-name \
--output json > s3_objects.json
# Convert to CSV
python3 -c "
import json, csv, sys
with open('s3_objects.json') as f:
data = json.load(f)
w = csv.writer(sys.stdout)
w.writerow(['Key','Size','LastModified','ETag','StorageClass'])
for obj in data.get('Contents', []):
w.writerow([
obj.get('Key',''),
obj.get('Size',''),
obj.get('LastModified',''),
obj.get('ETag','').strip('"'),
obj.get('StorageClass','STANDARD')
])
" > s3_metadata.csv
# Quick alternative: list just keys and sizes
aws s3 ls s3://your-bucket-name --recursive --human-readable > s3_listing.txt--prefix "folder/subfolder/" to limit the export to objects under a specific path. This is much faster than listing the entire bucket if you only need a subset of objects.S3 Inventory (Built-in)
S3 Inventory is AWS's built-in solution for large-scale object metadata exports. Once configured, it automatically delivers daily or weekly CSV (or Parquet) reports to a destination bucket. It's the best option for ongoing metadata monitoring and works efficiently on buckets with millions of objects.
inventory/), select CSV as the output format, and choose Daily or Weekly frequency.# Create inventory configuration via CLI
aws s3api put-bucket-inventory-configuration \
--bucket your-source-bucket \
--id metadata-inventory \
--inventory-configuration '{
"Destination": {
"S3BucketDestination": {
"Bucket": "arn:aws:s3:::your-destination-bucket",
"Format": "CSV",
"Prefix": "inventory"
}
},
"IsEnabled": true,
"Id": "metadata-inventory",
"IncludedObjectVersions": "Current",
"OptionalFields": [
"Size", "LastModifiedDate", "StorageClass",
"ETag", "IsMultipartUploaded", "EncryptionStatus"
],
"Schedule": { "Frequency": "Daily" }
}'Python + boto3
The boto3 SDK gives you the most flexibility. You can list objects with their standard metadata, then make individual HEAD requests to pull custom metadata (user-defined headers) or tag sets for each object. This is the best approach when you need fields beyond what the list-objects API returns.
pip install boto3. Ensure your AWS credentials are configured (~/.aws/credentials or environment variables).import boto3
import csv
s3 = boto3.client("s3")
BUCKET = "your-bucket-name"
def list_all_objects(bucket, prefix=""):
"""List all objects with pagination."""
paginator = s3.get_paginator("list_objects_v2")
objects = []
for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
objects.extend(page.get("Contents", []))
return objects
objects = list_all_objects(BUCKET)
with open("s3_metadata.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow([
"Key", "Size (bytes)", "Last Modified",
"ETag", "Storage Class"
])
for obj in objects:
writer.writerow([
obj["Key"],
obj["Size"],
obj["LastModified"].isoformat(),
obj["ETag"].strip('"'),
obj.get("StorageClass", "STANDARD"),
])
print(f"Exported {len(objects)} objects to s3_metadata.csv")
# To include custom metadata and tags (slower, 1 API call per object):
# for obj in objects:
# head = s3.head_object(Bucket=BUCKET, Key=obj["Key"])
# custom_meta = head.get("Metadata", {})
# content_type = head.get("ContentType", "")
# tags_resp = s3.get_object_tagging(Bucket=BUCKET, Key=obj["Key"])
# tags = {t["Key"]: t["Value"] for t in tags_resp["TagSet"]}x-amz-meta-* headers) or S3 object tags. Getting those requires a HEAD request per object, which is slow and costly on large buckets. For 100,000+ objects, use S3 Inventory plus S3 Batch Operations instead.What metadata fields can you export?
| Field | AWS CLI | S3 Inventory | boto3 |
|---|---|---|---|
| Object key (path) | ✓ | ✓ | ✓ |
| Object size | ✓ | ✓ | ✓ |
| Last modified date | ✓ | ✓ | ✓ |
| ETag (content hash) | ✓ | ✓ | ✓ |
| Storage class | ✓ | ✓ | ✓ |
| Content type | HEAD only | ✕ | HEAD only |
| Custom metadata headers | HEAD only | ✕ | HEAD only |
| Object tags | Separate call | ✕ | Separate call |
| Encryption status | ✕ | ✓ | HEAD only |
| Replication status | ✕ | ✓ | ✕ |
| Is multipart upload | ✕ | ✓ | ✕ |
| Object version ID | With versioning | ✓ | With versioning |
| Object lock status | ✕ | ✓ | ✕ |
| Bucket name | ✓ | ✓ | ✓ |
- Custom metadata requires per-object API calls: The list-objects API only returns key, size, last modified, ETag, and storage class. Content-Type, custom metadata headers, and tags require a HEAD or GET request per object.
- S3 Inventory has a 48-hour delay: The first inventory report can take up to 48 hours. It's designed for ongoing monitoring, not instant one-time exports.
- Folder concepts are virtual: S3 does not have real folders. "Folders" are just common prefixes in object keys. The export will list every object with its full key path, not a hierarchical folder structure.
You have your metadata export.
Now score it.
Upload your CSV or Excel file to MQS and get a structural metadata health score out of 100 with dimension breakdowns and actionable diagnostics.