Skip to content

Refactor lexer logic#209

Merged
6cdh merged 6 commits intojeapostrophe:masterfrom
6cdh:better-hover
Apr 14, 2026
Merged

Refactor lexer logic#209
6cdh merged 6 commits intojeapostrophe:masterfrom
6cdh:better-hover

Conversation

@6cdh
Copy link
Copy Markdown
Collaborator

@6cdh 6cdh commented Apr 13, 2026

This is the first PR of hover enhancement: #206 I guess I need 4 PRs in total for this better hover plan.

This PR:

  • Fix some services behavior. When start = end (empty range), means the identifier is not found in the source file, but exists in expanded syntax. The old behavior added it by giving a 1 length range. It no longer adds this type of items now.
  • Refactor and extract some helper functions from doc-hover function.
  • Split and refactor lexer logic into doclib/lexer.rkt, add a thread safe lazy cache abstraction for it. Doc struct now has a field to store a lazy cache of lexer snapshot. The lexer snapshot resets after each edit, and only runs in query functions when needed. So it's lazy. And it's safe as the lazy abstraction is semaphore-protected.
  • Rewrite lexer callers' logic to use let them use the lexer module.
  • Remove doc-get-symbols and doc-guess-token API because they have inappropriate design. doc-get-symbols return an interval map of symbol tokens, which is a full data. A good design should provide query-based interface, not return full data. doc-guess-token has weird edge behavior not matching its name, and also too complex to describe in a sentence.
  • Add doc-token-at API which really return token at the given position, or false if no token exists at the position.
  • Add doc-token-prefix-at API which return the token content at the given position, but only the part before the given position.
  • Fix a off-by-one bug in document-symbol logic. It's because the lexer returns 1-based indexed data. But all other places expect 0-based indexed data. It will make document-symbol slightly more useful. This bug won't appear because all lexer logic is at a single place now.

For the lexer, each pass does a full tokenize process and store all tokens into a sorted vector. Each pass takes ~2ms for doclib/doc.rkt. At most one pass runs if no more edits.

I also explored other options for lexer:

  • Full, eager lexer. Lexer runs after each edit eagerly. It's simplest, but the computation might be wasted. And each pass requires a full text copy of current text buffer. This partially defeats the purpose of the efficient text buffer.
  • Incremental, eager lexer. Lexer runs after each edit eagerly, but incremental. This will be fast, efficient, but complex. The lexer can be stateful. And this requires us to use complex data structure to store the lexer state for each token, and more complex than current used sorted vector.
  • Full, lazy lexer. It's using this method currently. Not simpler than option 1, but much simpler than option 2, and no wasted computation. The only risk is it needs to write the data in query path which is assumed to be readonly. I explored many concurrency options to make it safe, and a semaphore-protected lazy cache abstraction is the simplest. I didn't use data/interval-map because it's almost 30x slower than build a sorted vector.

6cdh added 6 commits April 8, 2026 20:10
When the text range is empty (start = end), ignore the item.
- Make lexer lazy, only runs when needed
- Use a sorted vector to store tokens
- Fix the document symbol off by one bug

API changes:

- Remove doc-get-symbols
- Remove doc-guess-token
- Add doc-token-at which returns token at given positon
- Add doc-token-prefix-at which retuns token prefix before given position
@6cdh
Copy link
Copy Markdown
Collaborator Author

6cdh commented Apr 13, 2026

The Resyntax CI failed because of a upstream problem: sorawee/pretty-expressive#5

EDIT: fixed

@6cdh 6cdh requested a review from dannypsnl April 13, 2026 13:23
@6cdh 6cdh merged commit 5cbf431 into jeapostrophe:master Apr 14, 2026
19 of 20 checks passed
@6cdh 6cdh deleted the better-hover branch April 14, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants