Components:
- Ingestion: Lambda
- Datalake: S3
- Data discovery: Glue ETL
- Data dictionary: Glue Catalog
- Data federation: Athena
- Presentation: Quicksight
- can be triggered using time schedule
- query spot instance API
- output json files into S3
- needs IAM role with appropiate permissions
- python or other language
- set of buckets containing json
- smart folder structure as the base for metadata retrieval
- similar to storing files in Hadoop filesystems
- determine schema's
- scans folder structure and json files to create data catalog
- connects to S3 and Glue Datacatalog
- provides a federated view of underlying data
- can be queried using SQL
- compatible with/provides interface for Hadoop queries