Skip to content

fix(heartbeat): add language fallback detection from file extension#5

Merged
AnnatarHe merged 2 commits intomainfrom
fix/language-fallback-detection
Dec 28, 2025
Merged

fix(heartbeat): add language fallback detection from file extension#5
AnnatarHe merged 2 commits intomainfrom
fix/language-fallback-detection

Conversation

@AnnatarHe
Copy link
Copy Markdown
Contributor

Summary

  • Add fallback language detection from file extension when vim.bo[bufnr].filetype is empty
  • Create lua/shelltime/utils/language.lua with extension-to-language mapping
  • Update heartbeat.lua to use lang.get_language(filetype, file_path)

Test plan

  • Run ./scripts/test.sh to verify all tests pass
  • Test with files that have no filetype set to verify fallback detection
  • Test with regular files to verify filetype is used when available

🤖 Generated with Claude Code

When vim.bo[bufnr].filetype is empty, detect language from file extension.
This ensures the language field is populated even when Neovim's filetype
detection doesn't recognize the file.

- Add lua/shelltime/utils/language.lua with extension mapping
- Update heartbeat.lua to use lang.get_language()
- Add comprehensive tests for language detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @AnnatarHe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the shelltime plugin's ability to accurately determine the programming language of a file. Previously, it relied solely on Neovim's buffer filetype, which could sometimes be unset. The changes introduce a new utility that intelligently infers the language from the file extension or specific filenames (like Dockerfile) as a fallback, ensuring that language information is consistently available for heartbeat events. This improves the reliability of language reporting within the plugin.

Highlights

  • Fallback Language Detection: Implemented a fallback mechanism to detect the language from a file's extension when the Neovim buffer's filetype (vim.bo[bufnr].filetype) is empty or not set.
  • New Language Utility Module: Introduced a new utility module at lua/shelltime/utils/language.lua which contains a mapping of file extensions to language identifiers and the logic for language detection.
  • Heartbeat Integration: Updated lua/shelltime/heartbeat.lua to utilize the new lang.get_language(filetype, file_path) function, ensuring more robust language identification for heartbeat events.
  • Comprehensive Testing: Added a new test file tests/language_spec.lua with extensive test cases for the language detection utility, covering various scenarios including filetype presence, extension-based detection, special filenames, and edge cases.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown

claude bot commented Dec 28, 2025

Code Review: Language Fallback Detection

Thanks for this contribution! Overall, this is a solid implementation that addresses the need for fallback language detection. Here's my detailed review:

Strengths

  1. Well-structured module: Follows the codebase conventions perfectly - clean separation of concerns, proper LuaDoc annotations, and consistent naming
  2. Comprehensive test coverage: 163 lines of tests covering edge cases, special filenames, case insensitivity, and error conditions
  3. Non-breaking change: The fallback behavior only activates when filetype is empty, preserving existing functionality
  4. Follows project conventions: 2-space indentation, snake_case naming, single quotes, and proper module structure

🐛 Potential Issues

1. Extension Mapping Inconsistency (lua/shelltime/utils/language.lua:23-24)

cc = 'cpp',
cxx = 'cpp',
h = 'c',
hpp = 'cpp',

The .h extension is mapped to 'c', but it could be either C or C++. Consider:

  • Mapping to 'cpp' by default (more common in modern codebases)
  • OR checking file content for C++ indicators (#include <iostream>, class, etc.)
  • OR documenting this assumption

2. Case-Sensitivity in Special Filename Detection (lua/shelltime/utils/language.lua:67-72)

local filename = file_path:match('[/\\]?([^/\\]+)$') or ''
local basename = filename:lower()
if basename == 'dockerfile' then

The regex pattern [/\\]? allows an optional separator at the start of the filename, which seems unnecessary. Should just be:

local filename = file_path:match('[^/\\]+$') or ''

3. Missing Common Languages

The extension map is missing some common file types:

  • r'r' (R language)
  • ex, exs'elixir'
  • erl, hrl'erlang'
  • dart'dart'
  • vue'vue'
  • svelte'svelte'
  • scala'scala'
  • clj, cljs'clojure'

Not blocking, but worth considering for completeness.

Performance Considerations

  1. String Pattern Matching: The extension extraction uses file_path:match('%.([^%.]+)$') which is efficient. Good choice!
  2. Lowercase Conversion: Only done when needed (after filetype check fails). Well optimized.
  3. No Caching: For frequently accessed files, you might want to cache results. However, given that filetype is usually available, this is likely premature optimization.

🔒 Security Concerns

No security issues identified. The module:

  • Doesn't execute external commands
  • Doesn't read files
  • Doesn't expose unsafe operations
  • Uses safe string pattern matching

📝 Code Quality Suggestions

1. Simplify Extension Detection (lua/shelltime/utils/language.lua:59-63)

Current code:

local ext = file_path:match('%.([^%.]+)$')
if ext then
  ext = ext:lower()
  return extension_map[ext] or ext
end

Could be more defensive:

local ext = file_path:match('%.([^%.]+)$')
if not ext or ext == '' then
  -- Check special filenames before returning empty
  goto check_special
end

ext = ext:lower()
return extension_map[ext] or ext

::check_special::
-- Special case handling...

But current approach is fine - the special filename check handles this case.

2. Consider Adding Windows Path Support

The special filename detection uses [/\\] for path separators, which is good! But ensure this works correctly on Windows with backslashes. The current implementation should work, but worth testing.

🧪 Test Coverage

Excellent test coverage! The tests cover:

  • ✅ Filetype priority over extension
  • ✅ Common extensions
  • ✅ Case insensitivity
  • ✅ Special filenames (Dockerfile, Makefile)
  • ✅ Edge cases (nil, empty strings, hidden files)
  • ✅ Multiple dots in filenames

Suggestion: Add a test for Windows-style paths with backslashes:

it('should handle Windows paths', function()
  assert.equals('lua', language.get_language('', 'C:\\path\\to\\file.lua'))
end)

📚 Documentation

The LuaDoc annotations are good:

---@param filetype string|nil Buffer filetype
---@param file_path string File path
---@return string Language identifier

Consider adding a module-level comment explaining the fallback strategy:

--- Language detection utilities for shelltime
--- Provides fallback language detection from file extension when filetype is empty
---
--- Detection order:
---   1. Use vim filetype if available (highest priority)
---   2. Detect from file extension (case-insensitive)
---   3. Check for special filenames (Dockerfile, Makefile)
---   4. Return unknown extension as-is or empty string

🎯 Integration Points

The integration in heartbeat.lua:107 is clean:

language = lang.get_language(vim.bo[bufnr].filetype, file_path),

✅ Passes both filetype and file_path
✅ Handles nil filetype gracefully
✅ No breaking changes to existing API

📋 Recommendations

Before merging:

  1. ✅ Run ./scripts/test.sh to ensure all tests pass
  2. ⚠️ Consider the .h file mapping (C vs C++)
  3. ⚠️ Optional: Add Windows path test
  4. ⚠️ Optional: Enhance documentation with detection order

Nice to have (future PRs):

  • Add more language extensions
  • Consider content-based detection for ambiguous extensions
  • Add caching if performance becomes an issue

📊 Overall Assessment

Quality: ⭐⭐⭐⭐ (4/5)
Test Coverage: ⭐⭐⭐⭐⭐ (5/5)
Maintainability: ⭐⭐⭐⭐⭐ (5/5)

This is a well-crafted PR that solves a real problem with minimal risk. The code follows project conventions, has excellent test coverage, and doesn't introduce breaking changes. The issues I've noted are minor and mostly suggestions for enhancement rather than critical fixes.

Recommendation: ✅ Approve with minor suggestions

Great work! 🎉

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively adds a fallback mechanism for language detection using file extensions, which is a great enhancement. The new language.lua utility is well-structured and comes with a comprehensive set of tests. I have a couple of suggestions to improve the maintainability of the new utility and its tests, primarily by using table-driven logic to reduce repetition and make future extensions easier.

Comment on lines +66 to +75
-- Special case: Dockerfile, Makefile, etc.
local filename = file_path:match('[/\\]?([^/\\]+)$') or ''
local basename = filename:lower()
if basename == 'dockerfile' then
return 'dockerfile'
elseif basename == 'makefile' then
return 'makefile'
end

return ''
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The special filename handling can be made more maintainable and scalable by using a lookup table instead of an if/elseif chain. This makes it easier to add more special filenames in the future.

For better performance, you could also consider moving the special_filenames table outside the get_language function to avoid re-creating it on every call.

  -- Special case: Dockerfile, Makefile, etc.
  local special_filenames = {
    dockerfile = 'dockerfile',
    makefile = 'makefile',
  }
  local filename = file_path:match('([^/\\]+)$')
  if filename then
    local lang = special_filenames[filename:lower()]
    if lang then
      return lang
    end
  end

  return ''

Comment on lines +11 to +39
describe('when filetype is available', function()
it('should return filetype as-is for lua', function()
assert.equals('lua', language.get_language('lua', '/path/to/file.lua'))
end)

it('should return filetype as-is for python', function()
assert.equals('python', language.get_language('python', '/path/to/file.py'))
end)

it('should return filetype as-is for typescript', function()
assert.equals('typescript', language.get_language('typescript', '/path/to/file.ts'))
end)

it('should return filetype as-is for javascript', function()
assert.equals('javascript', language.get_language('javascript', '/path/to/file.js'))
end)

it('should return filetype as-is for sh', function()
assert.equals('sh', language.get_language('sh', '/path/to/file.sh'))
end)

it('should return filetype as-is for vim', function()
assert.equals('vim', language.get_language('vim', '/path/to/file.vim'))
end)

it('should return filetype even if extension differs', function()
assert.equals('python', language.get_language('python', '/path/to/file.txt'))
end)
end)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of tests is quite repetitive. You can use a data-driven approach to make the tests more concise and easier to maintain. This involves defining a table of test cases and iterating over it to create the tests.

This pattern can also be applied to other repetitive test blocks in this file, such as 'when filetype is empty' and 'special filenames'.

    describe('when filetype is available', function()
      local test_cases = {
        { name = 'lua', filetype = 'lua', path = '/path/to/file.lua' },
        { name = 'python', filetype = 'python', path = '/path/to/file.py' },
        { name = 'typescript', filetype = 'typescript', path = '/path/to/file.ts' },
        { name = 'javascript', filetype = 'javascript', path = '/path/to/file.js' },
        { name = 'sh', filetype = 'sh', path = '/path/to/file.sh' },
        { name = 'vim', filetype = 'vim', path = '/path/to/file.vim' },
        { name = 'python with different extension', filetype = 'python', path = '/path/to/file.txt' },
      }

      for _, tc in ipairs(test_cases) do
        it('should return filetype as-is for ' .. tc.name, function()
          assert.equals(tc.filetype, language.get_language(tc.filetype, tc.path))
        end)
      end
    end)

Files like .hidden, .gitignore should return empty string, not the
filename after the dot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Dec 28, 2025

Pull Request Review: Language Fallback Detection

Thanks for this contribution! This PR adds a useful feature to handle cases where Neovim's filetype detection fails. Here's my detailed review:

✅ Strengths

  1. Excellent Test Coverage - 163 lines of comprehensive tests covering:

    • Normal operation with filetype present
    • Extension-based fallback detection
    • Edge cases (hidden files, multiple dots, case insensitivity)
    • Special filenames (Dockerfile, Makefile)
    • The test-to-code ratio is excellent
  2. Clean Architecture - New functionality is properly isolated in lua/shelltime/utils/language.lua following the existing module structure

  3. Good Documentation - LuaDoc annotations are present and accurate

  4. Follows Conventions - Code style matches CLAUDE.md requirements:

    • 2-space indentation ✓
    • snake_case naming ✓
    • Single quotes ✓
    • Local variables ✓
  5. Backward Compatible - The change is non-breaking and maintains existing behavior when filetype is available

🔍 Potential Issues

1. Pattern Matching Edge Case (Minor)

Line lua/shelltime/utils/language.lua:60:

local filename = file_path:match('[/\\]?([^/\\]+)$') or ''

The pattern [/\\]? makes the leading separator optional, which could cause issues with paths like:

  • file.lua (no directory) - works fine
  • subdir/file.lua - works fine
  • But the ? is unnecessary since [^/\\]+ already handles filenames without leading separators

Suggestion: Simplify to file_path:match('([^/\\]+)$') for clarity.

2. Unknown Extension Behavior (Design Question)

Line lua/shelltime/utils/language.lua:71:

return extension_map[ext] or ext

When an unknown extension is encountered (e.g., .xyz), it returns the extension as-is. This could lead to inconsistent language identifiers being sent to the daemon.

Questions:

  • Is it better to return an empty string for unknown extensions?
  • Does the ShellTime daemon handle arbitrary extension strings gracefully?
  • The test at line 110-112 explicitly expects this behavior - is this intentional?

3. Missing Integration Test (Minor)

The new language_spec.lua tests the module in isolation, but there's no test in heartbeat_spec.lua that verifies the integration of lang.get_language() into the heartbeat creation flow.

Suggestion: Add a test in heartbeat_spec.lua that mocks a buffer with empty filetype and verifies the heartbeat contains the extension-based language.

📊 Performance Considerations

  • Regex Compilation: The pattern matching operations run on every heartbeat. Lua caches compiled patterns internally, so this should be fine.
  • String Operations: lower() and match() are efficient operations. No concerns here.
  • Extension Map Lookup: O(1) table lookup, very efficient.

Overall: No performance issues expected.

🔒 Security Concerns

No security issues identified:

  • No user input is executed
  • File paths are only pattern-matched, not accessed
  • No injection vectors present

📝 Minor Suggestions

  1. Dockerfile. variants*: The code handles Dockerfile but not common variants like Dockerfile.dev, Dockerfile.prod. Consider:

    if basename:match('^dockerfile') then
      return 'dockerfile'
  2. More special files: Consider adding support for other common extension-less files:

    • .bashrc, .zshrcbash/zsh
    • .vimrcvim
    • Gemfile, Rakefileruby
    • CMakeLists.txtcmake
  3. Comment clarity: The comment at line 62-63 could be more precise:

    -- Hidden files without extension (e.g., .gitignore, .hidden)
    -- Return empty string since we can't determine language

🎯 Test Plan Verification

The PR description mentions a test plan but the checkboxes are empty. Before merging:

  • ✅ Run ./scripts/test.sh to verify all tests pass
  • ✅ Manual testing with files without filetype set
  • ✅ Manual testing with regular files

🏁 Verdict

Approve with minor suggestions

This is a solid PR that solves a real problem with good test coverage and clean implementation. The suggested improvements are minor and optional. The code is ready to merge as-is, though addressing the edge cases mentioned would make it even more robust.

Great work! 🎉

@AnnatarHe AnnatarHe merged commit 1d1706c into main Dec 28, 2025
3 checks passed
@AnnatarHe AnnatarHe deleted the fix/language-fallback-detection branch December 28, 2025 15:40
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 28, 2025

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant