This lab uses several AWS services including Lambda, S3, IAM, and Bedrock. To learn more about cloud computing on AWS, please check out the following resources:
- Documentation
- Libraries
The pipeline consists of three components:
- S3 Bucket: Stores uploaded PDF documents
- Lambda Function: Automatically triggered on file upload to S3
- LandingAI ADE:
- Processes documents and extracts chunks with bounding boxes.
- Creates individual JSON files for each document chunk
- Storage:
output/medical/: Markdown filesoutput/medical_grounding/: Grounding data with bounding boxesoutput/medical_chunks/: Individual chunk JSON files for Knowledge Baseoutput/medical_chunk_images/: Dynamically generated cropped chunk images
- AWS Bedrock Knowledge Base: Indexes individual chunk JSON files
- Metadata: Maintains chunk type, page number, and bounding box coordinates
- Strands Agent Framework: Orchestrates conversation flow
- Bedrock Memory Service: Maintains conversation context
- Visual Grounding:
- Extracts and crops specific chunk regions from PDFs
- Adds red border highlighting around chunks
To replicate the lab, you must configure your own AWS account.
- Python
- Use version 3.10
- OS
- Recommended to use x86_64
- AWS
- Please get AWS account with permissions for the following service
- Lambda
- S3
- IAM
- Bedrock
- CloudWatch Logs
- In your account you must set up the following resources
- S3 Bucket
- Bedrock Knowledge Base
- Please get AWS account with permissions for the following service
- LandingAI
- Vision Agent API Key
- Remember that you can make a free account at LandingAI:
sc-landingai/
├── L6.ipynb # Main lab notebook
├── ade_s3_handler.py # Lambda function for document processing
├── lambda_helpers.py # Helper functions for Lambda deployment
├── visual_grounding_helper.py # Functions for creating cropped chunk images
├── medical/ # Sample medical PDF documents
│ ├── Common_cold_clinincal_evidence.pdf
│ ├── CT_Study_of_the_Common_Cold.pdf
│ ├── Evaluation_of_echinacea_for_the_prevention_and_treatment_of_the_common_cold.pdf
│ ├── Prevention_and_treatment_of_the_common_cold.pdf
│ ├── The_common_cold_a_review_of_the_literature.pdf
│ ├── Understanding_the_symptoms_of_the_common_cold_and_influenza.pdf
│ ├── Viruses_and_Bacteria_in_the_Etiology_of_the_Common_Cold.pdf
│ └── Vitamin_C_for_Preventing_and_Treating_the_Common_Cold.pdf
└── README.md # This file
- Make two folders in your S3 bucket called
input/andoutput/ - Connect the Bedrock Knowledge Base to the folder
Create a .env file with your credentials:
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-west-2
S3_BUCKET=your-bucket-name
VISION_AGENT_API_KEY=your_landingai_api_key
BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
BEDROCK_KB_ID=your_knowledge_base_idpip install boto3 python-dotenv Pillow PyMuPDF landingai-ade typing-extensions
pip install bedrock-agentcore strands-agents pandasOpen Lab-6.ipynb in Jupyter and follow the step-by-step instructions to:
- Deploy the Lambda function
- Set up S3 triggers
- Process medical documents (creates chunks automatically)
- Configure Bedrock Knowledge Base to index
output/medical_chunks/ - Test chunk-based search with
search_medical_chunks() - Launch the interactive chatbot
Monitor Lambda execution in AWS CloudWatch:
- Processing status for each document
- Error messages and stack traces
- Performance metrics and duration
Check processed outputs:
# List all processed files
stats = monitor_lambda_processing(logs_client, s3_client, bucket_name)Verify document ingestion:
response = bedrock_agent.start_ingestion_job(
knowledgeBaseId=BEDROCK_KB_ID,
dataSourceId=DATA_SOURCE_ID
)- Lambda Timeout: Increase timeout in deployment (default: 900s)
- Memory Errors: Increase Lambda memory (default: 1024MB)
- IAM Permissions: Ensure role has S3 and CloudWatch access
- Python Version Mismatch: Use Python 3.10 for compatibility
- Knowledge Base Not Found: Verify KB ID and region settings
# Check Lambda logs
monitor_lambda_processing(logs_client, s3_client, bucket)
# Verify S3 outputs
s3_client.list_objects_v2(Bucket=bucket, Prefix='output/')
# Test chunk-based search
results = search_medical_chunks("test query", s3_client, bucket)
# Test knowledge base search
test_result = search_knowledge_base("test query")