An AI-powered question generation system that extracts content from PDF files and generates Open edX-compatible questions using AWS Bedrock Claude models.
- PDF Content Extraction: Automatically extracts text from PDF course materials
- AI Question Generation: Uses AWS Bedrock Claude 3.5 Sonnet for intelligent question creation
- Open edX Integration: Publishes generated questions directly to Open edX content libraries
- Multiple Question Types: Supports single-select, multi-select, dropdown, numerical input, and text input questions
- Flexible Configuration: Environment-based configuration for different courses and libraries
PDF Content → Text Extraction → AWS Bedrock Claude → Question Generation → Open edX Publishing
- Content Ingestion: PDF files are processed and text is extracted
- AI Processing: AWS Bedrock Claude 3.5 Sonnet generates diverse questions based on content
- Question Formatting: Questions are converted to Open edX OLX format
- Library Publishing: Questions are published to specified Open edX content libraries
This system uses AWS Bedrock's Claude 3.5 Sonnet model (anthropic.claude-3-5-sonnet-20240620-v1:0) for question generation. The AI model:
- Analyzes PDF content and learning objectives
- Generates diverse question types appropriate for different cognitive levels
- Creates questions with proper explanations and metadata
- Ensures questions align with Bloom's taxonomy levels
- Single Select: Multiple choice with one correct answer
- Multi-select: Multiple choice with multiple correct answers
- Dropdown: Selection from dropdown menu
- Numerical Input: Number-based answers with tolerance
- Text Input: Short text responses
- Clone the repository:
git clone <repository-url>
cd duaa-lesson-content-generator- Install dependencies:
pip install -r requirements.txt- Configure AWS credentials:
# Set AWS credentials for Bedrock access
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-east-1# AWS Bedrock Configuration
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20240620-v1:0
# Open edX Configuration
export OPENEDX_API_URL=http://studio.local.openedx.io:8001
export OPENEDX_LIBRARY_ID=lib:myorg:kg1_english_2025
export CSRF_TOKEN=your_csrf_token
export STUDIO_SESSION_ID=your_session_id
export JWT_HEADER_PAYLOAD=your_jwt_payload
# Content Configuration
export COURSE_CONTENT_DIR=/path/to/course/contentIf AWS Bedrock is not available, the system can fallback to OpenAI:
export OPENAI_API_KEY=your_openai_keyfrom src.main import generate_questions_for_openedx
# Generate questions from PDF content
result = generate_questions_for_openedx(
course_content_dir="./Course Content",
library_id="lib:myorg:kg1_english_2025",
topic="KG1 English Course",
num_questions=10,
aws_region='us-east-1'
)
if result['success']:
print(f"Generated {result['questions_generated']} questions")
print(f"Published {result['questions_published']} questions")python src/main.pyThe system supports creating organized question collections for different course days:
from src.utils.openedx_hierarchy import OpenEdXHierarchyManager
# Create day-level collections
manager = OpenEdXHierarchyManager(library_id, api_url, cookies)
for day in range(1, 11): # Day 1-10
collection = manager.create_collection(
f"Day {day} Questions",
f"Questions for Day {day} content"
)
# Generate and publish day-specific questions
questions = generate_questions_with_bedrock(
day_content, f"Day {day} Content", prompt, num_questions=5
)
publish_to_openedx(library_id, questions, api_url, cookies, collection['key'], day)duaa-lesson-content-generator/
├── src/
│ ├── main.py # Main entry point
│ ├── openai_utils/
│ │ └── prompts.py # AI prompts for question generation
│ └── utils/
│ ├── question_generator.py # Core question generation logic
│ └── openedx_hierarchy.py # Open edX collection management
├── requirements.txt # Python dependencies
└── README.md # This file
boto3: AWS SDK for Bedrock accessopenai: OpenAI API client (fallback)PyPDF2: PDF text extractionrequests: HTTP client for Open edX APIpython-dotenv: Environment variable management
The system integrates with Open edX through the tutor-contentlibrary-api plugin:
- Authentication: Uses CSRF tokens and session cookies
- Question Publishing: Creates problem blocks in content libraries
- OLX Format: Generates Open edX-compatible XML question format
- Collection Management: Organizes questions into day-level collections
The system includes comprehensive error handling for:
- PDF extraction failures
- AWS Bedrock API errors
- Open edX authentication issues
- Network connectivity problems
- Invalid question formats
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the error logs for detailed error messages
- Verify AWS credentials and permissions
- Ensure Open edX API access is properly configured
- Review environment variable settings
- Initial release with AWS Bedrock integration
- Open edX question publishing
- PDF content extraction
- Multiple question type support
- Day-level collection organization