Data Lake
S3 Lifecycle
Serverless Cron
S3-Driven Trend Scraper: Automated Data Tiering
To stay ahead of 3D printing trends on platforms like MakerWorld, I needed to scrape metadata daily. Accumulating thousands of JSON files quickly becomes expensive if left in standard storage. This project focuses on automated data tiering using S3 Lifecycle policies.
// Automated Data Tiering Pipeline
[ EventBridge (Daily) ] → [ Lambda (Scraper) ] → [ S3 Standard (Hot) ]
↓ (30 Days)
[ S3 Glacier Flexible Retrieval ]
[ EventBridge (Daily) ] → [ Lambda (Scraper) ] → [ S3 Standard (Hot) ]
↓ (30 Days)
[ S3 Glacier Flexible Retrieval ]
The Lifecycle Strategy
Data from the last 30 days is kept in S3 Standard for immediate querying via Athena. After 30 days, the data loses its immediate tactical value but must be kept for historical year-over-year analysis. An S3 Lifecycle rule automatically transitions these objects to Glacier Flexible Retrieval, slashing storage costs by over 90% without requiring any manual operational overhead.