OCW Knowledge Interface

Utilizing word embeddings to explore content relationships within OpenCourseWare at MIT.

Overview

This repository explores the application of word embedding techniques to enhance knowledge discovery and content organization within MIT's OpenCourseWare (OCW) platform. The project aims to improve how students, educators, and researchers can navigate and find relevant educational materials across MIT's extensive collection of course content.

Background

MIT OpenCourseWare OCW Website is a web-based publication of virtually all MIT course content, made freely available to learners worldwide. With thousands of courses spanning multiple disciplines, finding relevant content and understanding relationships between different courses and topics can be challenging. This project leverages natural language processing and word embedding techniques to create more intuitive ways to explore and connect educational materials.

Objectives

Content Discovery: Improve the ability to find relevant course materials across different disciplines
Semantic Understanding: Create meaningful representations of course content using word embeddings
Knowledge Mapping: Identify relationships and connections between different courses and topics
Interface Enhancement: Develop tools and interfaces that make OCW content more accessible and navigable

Technical Approach

The project explores various word embedding techniques including:

Word2Vec: Creating vector representations of words from course content
Doc2Vec: Extending embeddings to entire documents and course materials
BERT/Transformer models: Leveraging pre-trained language models for better semantic understanding
Custom embeddings: Training domain-specific embeddings on MIT course content

Goals

1. Course Recommendation

Suggest related courses based on content similarity
Identify prerequisite relationships between courses
Recommend courses based on student interests and background

2. Content Search Enhancement

Semantic search capabilities beyond keyword matching
Find materials that discuss similar concepts using different terminology
Cross-disciplinary content discovery

3. Knowledge Graph Construction

Map relationships between concepts across different fields
Identify interdisciplinary connections
Create visual representations of knowledge domains

4. Personalized Learning Paths

Generate customized learning sequences
Adapt content recommendations based on learning progress
Identify knowledge gaps and suggest relevant materials

Repository Structure

ocw-knowledge-interface/
├── data/              # OCW content datasets and preprocessed files
├── embeddings/        # Word embedding models and trained vectors
├── notebooks/         # Jupyter notebooks for experimentation and analysis
├── src/              # Source code for embedding generation and analysis
├── interfaces/       # Web interface and visualization components
├── evaluation/       # Model evaluation scripts and metrics
├── docs/             # Documentation and research notes
└── requirements.txt  # Python dependencies

Getting Started

Prerequisites

Python 3.7+
Required packages (see requirements.txt)
Access to OCW content data

Installation

# Clone the repository
git clone https://github.com/dseaton/ocw-knowledge-interface.git
cd ocw-knowledge-interface

# Install dependencies
pip install -r requirements.txt

# Download necessary data and models
python setup.py

Usage

# Generate embeddings from OCW content
python src/generate_embeddings.py --data-path data/ocw_content

# Run evaluation metrics
python evaluation/evaluate_embeddings.py

# Launch interactive interface
python interfaces/run_interface.py

Evaluation Metrics

The project uses several metrics to assess the quality of embeddings:

Semantic similarity: Measuring how well embeddings capture conceptual relationships
Course clustering: Evaluating how well similar courses are grouped together
Recommendation accuracy: Testing the relevance of course and content suggestions
User evaluation: Gathering feedback on interface usability and effectiveness

License

This project is released under the MIT License, consistent with MIT's commitment to open educational resources.

Acknowledgments

MIT OpenCourseWare team for providing access to course content

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
static/css		static/css
templates		templates
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.py		app.py
input_examples.json		input_examples.json
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCW Knowledge Interface

Overview

Background

Objectives

Technical Approach

Goals

1. Course Recommendation

2. Content Search Enhancement

3. Knowledge Graph Construction

4. Personalized Learning Paths

Repository Structure

Getting Started

Prerequisites

Installation

Usage

Evaluation Metrics

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

dseaton/ocw-knowledge-interface

Folders and files

Latest commit

History

Repository files navigation

OCW Knowledge Interface

Overview

Background

Objectives

Technical Approach

Goals

1. Course Recommendation

2. Content Search Enhancement

3. Knowledge Graph Construction

4. Personalized Learning Paths

Repository Structure

Getting Started

Prerequisites

Installation

Usage

Evaluation Metrics

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages