Skip to content

Testing word embedding techniques for use with OpenCourseWare at MIT.

Notifications You must be signed in to change notification settings

dseaton/ocw-knowledge-interface

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCW Knowledge Interface

Utilizing word embeddings to explore content relationships within OpenCourseWare at MIT.

Overview

This repository explores the application of word embedding techniques to enhance knowledge discovery and content organization within MIT's OpenCourseWare (OCW) platform. The project aims to improve how students, educators, and researchers can navigate and find relevant educational materials across MIT's extensive collection of course content.

Background

MIT OpenCourseWare OCW Website is a web-based publication of virtually all MIT course content, made freely available to learners worldwide. With thousands of courses spanning multiple disciplines, finding relevant content and understanding relationships between different courses and topics can be challenging. This project leverages natural language processing and word embedding techniques to create more intuitive ways to explore and connect educational materials.

Objectives

  • Content Discovery: Improve the ability to find relevant course materials across different disciplines
  • Semantic Understanding: Create meaningful representations of course content using word embeddings
  • Knowledge Mapping: Identify relationships and connections between different courses and topics
  • Interface Enhancement: Develop tools and interfaces that make OCW content more accessible and navigable

Technical Approach

The project explores various word embedding techniques including:

  • Word2Vec: Creating vector representations of words from course content
  • Doc2Vec: Extending embeddings to entire documents and course materials
  • BERT/Transformer models: Leveraging pre-trained language models for better semantic understanding
  • Custom embeddings: Training domain-specific embeddings on MIT course content

Goals

1. Course Recommendation

  • Suggest related courses based on content similarity
  • Identify prerequisite relationships between courses
  • Recommend courses based on student interests and background

2. Content Search Enhancement

  • Semantic search capabilities beyond keyword matching
  • Find materials that discuss similar concepts using different terminology
  • Cross-disciplinary content discovery

3. Knowledge Graph Construction

  • Map relationships between concepts across different fields
  • Identify interdisciplinary connections
  • Create visual representations of knowledge domains

4. Personalized Learning Paths

  • Generate customized learning sequences
  • Adapt content recommendations based on learning progress
  • Identify knowledge gaps and suggest relevant materials

Repository Structure

ocw-knowledge-interface/
├── data/              # OCW content datasets and preprocessed files
├── embeddings/        # Word embedding models and trained vectors
├── notebooks/         # Jupyter notebooks for experimentation and analysis
├── src/              # Source code for embedding generation and analysis
├── interfaces/       # Web interface and visualization components
├── evaluation/       # Model evaluation scripts and metrics
├── docs/             # Documentation and research notes
└── requirements.txt  # Python dependencies

Getting Started

Prerequisites

  • Python 3.7+
  • Required packages (see requirements.txt)
  • Access to OCW content data

Installation

# Clone the repository
git clone https://github.com/dseaton/ocw-knowledge-interface.git
cd ocw-knowledge-interface

# Install dependencies
pip install -r requirements.txt

# Download necessary data and models
python setup.py

Usage

# Generate embeddings from OCW content
python src/generate_embeddings.py --data-path data/ocw_content

# Run evaluation metrics
python evaluation/evaluate_embeddings.py

# Launch interactive interface
python interfaces/run_interface.py

Evaluation Metrics

The project uses several metrics to assess the quality of embeddings:

  • Semantic similarity: Measuring how well embeddings capture conceptual relationships
  • Course clustering: Evaluating how well similar courses are grouped together
  • Recommendation accuracy: Testing the relevance of course and content suggestions
  • User evaluation: Gathering feedback on interface usability and effectiveness

License

This project is released under the MIT License, consistent with MIT's commitment to open educational resources.

Acknowledgments

  • MIT OpenCourseWare team for providing access to course content

About

Testing word embedding techniques for use with OpenCourseWare at MIT.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published