fix: type/method name detection across C, C++, Go, Ruby#10
Open
vnz wants to merge 1 commit intoCSCSoftware:masterfrom
Open
fix: type/method name detection across C, C++, Go, Ruby#10vnz wants to merge 1 commit intoCSCSoftware:masterfrom
vnz wants to merge 1 commit intoCSCSoftware:masterfrom
Conversation
Pre-existing extractor gaps where names were nested below the direct children of method/type nodes — surfaced by cross-language smoke testing during the HCL PR work (CSCSoftware#9). - **C**: Recurse into `function_declarator` (and wrappers) to find the function name; previously every C function definition was silently dropped from the methods table. - **C++**: Use tree-sitter's `declarator` field instead of pattern-matching direct children — avoids confusing the return type for the function name (`std::string foo()` was indexing as `std::string`). Walks through `pointer_declarator`, `reference_declarator`, `array_declarator`, `parenthesized_declarator`, and `attributed_declarator` wrappers, and handles `qualified_identifier`, `operator_cast`, `destructor_name`, `operator_name`, and `template_function` leaves. - **C++**: Drop `template_function` from CPP_METHOD_NODES — was a workaround for the name-extraction gap and produced duplicate entries for template specializations like `template<> void A::foo<int>()`. - **Go**: Accept `field_identifier` as a method name (used for receiver methods like `func (w *Widget) Greet()`). - **Ruby**: Accept `constant` and `scope_resolution` as type names — classes are `class` nodes whose name is a `constant`, and namespaced classes (`class Foo::Bar`) use `scope_resolution`. Verified with smoke tests across all 12 supported languages plus 4 synthetic C++ edge-case files (qualified return types, conversion operators, attributes, templates). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes pre-existing extractor gaps where method/type names were nested below the direct children of their parent node, causing them to be silently dropped from the index. Discovered during cross-language smoke testing while working on PR #9.
What was missing before
Created a small smoke-test corpus (one
Widgetclass +greetmethod per language) and indexed it. Result on master:void c_widget_greet(...)function_declarator)class CppWidget { greet() { ... } }func (w *Widget) Greet()field_identifiernot recognized)class RbWidgetconstantnot recognized)After the fix, all of these are extracted correctly.
Approach
C/C++: tree-sitter
declaratorfield + recursive walkerRather than enumerate every position the name might be in (which is fragile — there are many wrapper combinations), use
node.childForFieldName('declarator')to ask tree-sitter directly for the declarator, then recursively walk through any chain of declarator wrappers to find the name leaf.Wrapper types covered:
function_declarator,pointer_declarator,reference_declarator,parenthesized_declarator,array_declarator,attributed_declarator.Leaf types covered:
identifier,field_identifier,qualified_identifier,destructor_name,operator_name,operator_cast,template_function.Using the
declaratorfield instead of iterating all children is the key insight — it cleanly distinguishes the actual function name from confusable nodes like a qualified return type (std::string foo()would otherwise index asstd::string).C++: removed
template_functionfrom CPP_METHOD_NODESThis was a workaround for the previous name-extraction gap. With the field-aware lookup catching template specializations via their enclosing
function_definition, listingtemplate_functionseparately would produce duplicate entries.Go and Ruby: small node-type additions
field_identifierto method-name candidates (Go method names on receivers)constantandscope_resolutionto type-name candidates (Rubyclass Foo,class Foo::Bar)Test cases verified
Beyond the basic per-language widget test, verified handling for:
c_widget_greetc_widget_greetCppWidget::greetgreetint* A::getPtr()A::getPtrA& A::operator=()A::operator=int (*make_table())[10]make_tableFoo::operator bool() constFoo::operator bool() conststd::string get_name()get_name(notstd::string)const std::vector<int>& Foo::values()Foo::valuestemplate<> void A::foo<int>()A::foo<int>(no duplicate)template<> void freestanding<int>()freestanding<int>template<typename T> T identity(T x)identity[[nodiscard]] int compute()computeint compute2() [[nodiscard]]compute2func (w *Widget) Greet()Greetclass RbWidgetRbWidgetclass Foo::BarFoo::Barmodule A::BA::BTest plan
aidex_signatureModule::Classform are searchableNotes
src/parser/extractor.tsandsrc/parser/languages/cpp.ts(~78 lines).🤖 Generated with Claude Code