Utilizing word embeddings to explore content relationships within OpenCourseWare at MIT.
This repository explores the application of word embedding techniques to enhance knowledge discovery and content organization within MIT's OpenCourseWare (OCW) platform. The project aims to improve how students, educators, and researchers can navigate and find relevant educational materials across MIT's extensive collection of course content.
MIT OpenCourseWare OCW Website is a web-based publication of virtually all MIT course content, made freely available to learners worldwide. With thousands of courses spanning multiple disciplines, finding relevant content and understanding relationships between different courses and topics can be challenging. This project leverages natural language processing and word embedding techniques to create more intuitive ways to explore and connect educational materials.
- Content Discovery: Improve the ability to find relevant course materials across different disciplines
- Semantic Understanding: Create meaningful representations of course content using word embeddings
- Knowledge Mapping: Identify relationships and connections between different courses and topics
- Interface Enhancement: Develop tools and interfaces that make OCW content more accessible and navigable
The project explores various word embedding techniques including:
- Word2Vec: Creating vector representations of words from course content
- Doc2Vec: Extending embeddings to entire documents and course materials
- BERT/Transformer models: Leveraging pre-trained language models for better semantic understanding
- Custom embeddings: Training domain-specific embeddings on MIT course content
- Suggest related courses based on content similarity
- Identify prerequisite relationships between courses
- Recommend courses based on student interests and background
- Semantic search capabilities beyond keyword matching
- Find materials that discuss similar concepts using different terminology
- Cross-disciplinary content discovery
- Map relationships between concepts across different fields
- Identify interdisciplinary connections
- Create visual representations of knowledge domains
- Generate customized learning sequences
- Adapt content recommendations based on learning progress
- Identify knowledge gaps and suggest relevant materials
ocw-knowledge-interface/
├── data/ # OCW content datasets and preprocessed files
├── embeddings/ # Word embedding models and trained vectors
├── notebooks/ # Jupyter notebooks for experimentation and analysis
├── src/ # Source code for embedding generation and analysis
├── interfaces/ # Web interface and visualization components
├── evaluation/ # Model evaluation scripts and metrics
├── docs/ # Documentation and research notes
└── requirements.txt # Python dependencies
- Python 3.7+
- Required packages (see
requirements.txt) - Access to OCW content data
# Clone the repository
git clone https://github.com/dseaton/ocw-knowledge-interface.git
cd ocw-knowledge-interface
# Install dependencies
pip install -r requirements.txt
# Download necessary data and models
python setup.py# Generate embeddings from OCW content
python src/generate_embeddings.py --data-path data/ocw_content
# Run evaluation metrics
python evaluation/evaluate_embeddings.py
# Launch interactive interface
python interfaces/run_interface.pyThe project uses several metrics to assess the quality of embeddings:
- Semantic similarity: Measuring how well embeddings capture conceptual relationships
- Course clustering: Evaluating how well similar courses are grouped together
- Recommendation accuracy: Testing the relevance of course and content suggestions
- User evaluation: Gathering feedback on interface usability and effectiveness
This project is released under the MIT License, consistent with MIT's commitment to open educational resources.
- MIT OpenCourseWare team for providing access to course content