Skip to content

beshkenadze/eyecite-js

Repository files navigation

eyecite-js

npm version CI License Tests TypeScript GitHub issues GitHub stars

A TypeScript/JavaScript library for extracting legal citations from text

InstallationQuick StartFeaturesReport BugRequest Feature

📋 Table of Contents

Click to expand

About The Project

eyecite-js is a TypeScript/JavaScript port of the Python eyecite library for extracting legal citations from text strings. It recognizes a wide variety of citations commonly found in legal documents, making it an essential tool for legal tech applications.

Current Status (v2.7.6-alpha.24)

  • Complete Feature Parity: Full parity with Python eyecite v2.7.6
  • Production Ready: 354 passing tests, comprehensive citation coverage
  • Enhanced Features:
    • Overlap handling for multi-section citations
    • Clean options-based API
    • Id. citation resolution with section substitution
    • DOL Opinion support
  • Modern API: Refactored getCitations() with options object for better developer experience

See our ROADMAP.md for detailed feature parity status and development plans.

Key Features

  • 🚀 Complete Port: Full implementation of the Python eyecite library functionality
  • 📦 TypeScript Support: Written in TypeScript with complete type definitions
  • 🌐 JavaScript Compatible: Works in both Node.js and browser environments
  • 🔧 Extensible: Support for custom tokenizers and citation patterns
  • 🎯 Battle-tested: Based on the proven Python library used by CourtListener and Harvard's Caselaw Access Project
  • JavaScript Enhancements: Additional features like DOL Opinion citations not in Python version

Built With

TypeScript Node.js Bun

📦 Installation

🆕 What's New in v2.7.6-alpha.24

🎯 Cleaner API with Options Object

// Before (awkward)
getCitations(text, false, undefined, '', undefined, 'parent-only')

// After (clean)
getCitations(text, { overlapHandling: 'parent-only' })

🔄 Overlap Handling for Multi-Section Citations

Control how overlapping citations are returned:

  • 'all' (default): Returns all citations including nested ones
  • 'parent-only': Returns only encompassing citations
  • 'children-only': Returns only nested citations

🐛 Major Bug Fixes

  • Fixed multi-section law citations returning incorrect spans
  • Fixed overlapping citation annotation issues
  • Improved citation filtering logic

See the CHANGELOG for a complete list of changes.

Package Managers

Bun (recommended):

bun add @beshkenadze/eyecite

npm:

npm install @beshkenadze/eyecite

pnpm:

pnpm add @beshkenadze/eyecite

Yarn:

yarn add @beshkenadze/eyecite

Registries

npm Registry (default):

# Latest stable version
npm install @beshkenadze/eyecite

# Alpha version
npm install @beshkenadze/eyecite@alpha

# Beta version  
npm install @beshkenadze/eyecite@beta

GitHub Packages:

# Configure registry (one-time setup)
npm config set @beshkenadze:registry https://npm.pkg.github.com

# Install from GitHub Packages
npm install @beshkenadze/eyecite

# Or install directly
npm install @beshkenadze/eyecite --registry=https://npm.pkg.github.com

(back to top)

🚀 Quick Start

Basic Usage

import { getCitations } from '@beshkenadze/eyecite'

const text = `
  Mass. Gen. Laws ch. 1, § 2 (West 1999) (barring ...).
  Foo v. Bar, 1 U.S. 2, 3-4 (1999) (overruling ...).
  Id. at 3.
  Foo, supra, at 5.
`

const citations = getCitations(text)
console.log(citations)

Using Options (New in v2.7.6-alpha.23)

// Handle overlapping citations in multi-section references
const text = 'See 29 C.F.R. §§ 778.113, 778.114, 778.115'

// Get only the parent citation (avoids overlaps)
const citations = getCitations(text, { 
  overlapHandling: 'parent-only',
  removeAmbiguous: true 
})

(back to top)

✨ Features

eyecite-js recognizes the following citation types:

  • Full case citations: Bush v. Gore, 531 U.S. 98, 99-100 (2000)
  • Short form citations: 531 U.S., at 99
  • Id. citations: Id., at 101 (with advanced section substitution)
  • Supra citations: Bush, supra, at 100
  • Law citations: 29 C.F.R. §§ 778.113, 778.114 (with multiple section support)
  • Journal citations: 1 Minn. L. Rev. 1
  • DOL Opinion Letters: DOL Opinion Letter FLSA 2009-19 (Jan. 16, 2009)
  • Bluebook formatting: Reorder parallel citations according to Bluebook hierarchy

Citation Types

Each citation type is represented by a specific class:

  • FullCaseCitation: Complete case citations with volume, reporter, page
  • ShortCaseCitation: Abbreviated case citations
  • FullLawCitation: Statutory and regulatory citations
  • FullJournalCitation: Law journal citations
  • IdCitation: "Id." citations referring to previous citations
  • IdLawCitation: "Id. § 123" citations with section references
  • SupraCitation: "Supra" citations referring to previous citations
  • ReferenceCitation: Reference citations using case names
  • DOLOpinionCitation: Department of Labor Opinion Letters

Multiple Section Support

eyecite-js properly handles multiple sections indicated by §§:

const text = 'See 29 C.F.R. §§ 778.113 (the "statutory method"), 778.114 (the FWW method).'
const citations = getCitations(text)
// Returns 2 separate FullLawCitation objects

Text Cleaning

Built-in text cleaning utilities help prepare text for citation extraction:

import { cleanText } from '@beshkenadze/eyecite'

const dirtyText = '<p>foo   1  U.S.  1   </p>'
const cleanedText = cleanText(dirtyText, ['html', 'inline_whitespace'])
const citations = getCitations(cleanedText)

Citation Annotation

Add markup around citations in text:

import { annotateCitations } from '@beshkenadze/eyecite'

const text = 'bob lissner v. test 1 U.S. 12, 347-348 (4th Cir. 1982)'
const citations = getCitations(text)
const annotated = annotateCitations(
  text, 
  citations.map(c => [c.span(), '<a href="#">', '</a>'])
)
// Returns: 'bob lissner v. test <a href="#">1 U.S. 12</a>, 347-348 (4th Cir. 1982)'

Overlap Handling

Handle overlapping citations in multi-section references:

const text = 'See 29 C.F.R. §§ 778.113, 778.114, 778.115 for details.'

// Default: returns all citations including overlapping ones
const all = getCitations(text)

// Option 1: Get only parent citations (no nested ones)
const parentOnly = getCitations(text, { overlapHandling: 'parent-only' })

// Option 2: Get only nested citations (no parent)
const childrenOnly = getCitations(text, { overlapHandling: 'children-only' })

Citation Resolution

Resolve citations to their common references with advanced Id. support:

import { resolveCitationsWithIdSubstitution } from '@beshkenadze/eyecite'

const text = 'first: 29 C.F.R. § 778.113. second: Id. § 778.114. third: Id.'
const citations = getCitations(text)
const resolved = resolveCitationsWithIdSubstitution(citations)
// Properly resolves Id. citations with section substitution

Bluebook Formatting

Format citations according to Bluebook rules:

import { formatBluebook, ReporterType } from '@beshkenadze/eyecite'

// Reorder parallel citations according to Bluebook hierarchy
const text = 'Brown v. Jones, 2020 U.S. Dist. LEXIS 12345, 2020 WL 123456 (S.D.N.Y. 2020)'
const citations = getCitations(text)

// Format with Bluebook rules (WL before LEXIS, official reporters first)
const formatted = formatBluebook(citations, { reorderParallel: true })

// Check reporter types
import { getReporterType } from '@beshkenadze/eyecite'
const reporterType = getReporterType(citations[0]) // Returns ReporterType.ELECTRONIC_LEXIS

(back to top)

📚 API Documentation

getCitations(text, options?)

Extract citations from text.

Parameters:

  • text: The text to parse
  • options: Optional configuration object
    • removeAmbiguous: Remove ambiguous citations (default: false)
    • tokenizer: Custom tokenizer instance
    • markupText: Original markup text for enhanced extraction
    • cleanSteps: Text cleaning steps to apply
    • overlapHandling: How to handle overlapping citations (default: 'all')
      • 'all': Returns all citations including overlapping ones
      • 'parent-only': Returns only encompassing citations, excluding nested ones
      • 'children-only': Returns only nested citations, excluding parent citations

Returns: Array of citation objects

Example with overlap handling:

const text = 'See 29 C.F.R. §§ 778.113, 778.114, 778.115 for details.'

// Get all citations (default behavior)
const allCitations = getCitations(text)
// Returns 3 citations: the full multi-section citation and two nested ones

// Get only the parent citation
const parentOnly = getCitations(text, { overlapHandling: 'parent-only' })
// Returns 1 citation: "29 C.F.R. §§ 778.113, 778.114, 778.115"

// Get only the nested citations
const childrenOnly = getCitations(text, { overlapHandling: 'children-only' })
// Returns 2 citations: "778.114" and "778.115"

Citation Objects

Each citation object contains:

  • span(): Text span [start, end] in source text
  • fullSpan(): Full span including context
  • groups: Parsed citation components
  • metadata: Additional citation metadata
  • year: Citation year (if available)

(back to top)

💻 TypeScript Support

eyecite-js is written in TypeScript and includes complete type definitions:

import { 
  getCitations, 
  FullCaseCitation, 
  FullLawCitation,
  GetCitationsOptions,
  OverlapHandling 
} from '@beshkenadze/eyecite'

// Use typed options
const options: GetCitationsOptions = {
  overlapHandling: 'parent-only',
  removeAmbiguous: true
}

const citations = getCitations(text, options)

// Type-safe citation handling
citations.forEach(citation => {
  if (citation instanceof FullCaseCitation) {
    console.log(`Case: ${citation.groups.volume} ${citation.groups.reporter} ${citation.groups.page}`)
  } else if (citation instanceof FullLawCitation) {
    console.log(`Law: ${citation.groups.reporter} ${citation.groups.section}`)
  }
})

(back to top)

🗺️ Roadmap

  • Core Python library port (95% complete)
  • TypeScript support with full type safety
  • Multiple section parsing (C.F.R. §§)
  • Id. citation resolution with section substitution
  • DOL Opinion Letter support
  • Performance optimizations (WebAssembly tokenizer)
  • Complete test infrastructure parity
  • Additional citation formats (patents, international)

See our detailed ROADMAP.md for the complete development plan and open issues for specific features.

(back to top)

🤝 Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Please make sure to:

  • Run tests: bun test
  • Run linter: bun run lint
  • Run type check: bun run typecheck

(back to top)

🧪 Testing

# Run all tests
bun test

# Run specific test file
bun test find.test.ts

# Run tests in watch mode
bun test --watch

(back to top)

📄 License

Distributed under the BSD 2-Clause License. See LICENSE for more information.

(back to top)

📧 Contact

Aleksandr Beshkenadze - @beshkenadze

Project Link: https://github.com/beshkenadze/eyecite-js

(back to top)

🙏 Acknowledgments

(back to top)


Made with ❤️ for the legal tech community

About

TypeScript/JavaScript port of eyecite library for extracting legal citations from text

Resources

License

Contributing

Stars

Watchers

Forks

Packages