Tree-sitter Plugin System
kit includes a plugin system that allows you to extend and customize symbol extraction for any programming language. This system enables you to:
- Extend existing languages with additional query patterns (e.g., detect FastAPI routes in Python)
- Register completely new languages with custom parsers and queries
- Load multiple query files per language for modular organization
- Use custom query directories for team-specific or project-specific patterns
Table of Contents
- Quick Start
- API Reference
- Extending Existing Languages
- Registering New Languages
- Real-World Examples
- Best Practices
- Troubleshooting
Quick Start
Extending Python for Custom Patterns
from kit.tree_sitter_symbol_extractor import TreeSitterSymbolExtractor
# Define a custom query for detecting test functionstest_query = '''(function_definition name: (identifier) @name (#match? @name "^test_")) @definition.test_function'''
# Save query to a filewith open('test_patterns.scm', 'w') as f: f.write(test_query)
# Extend Python languageTreeSitterSymbolExtractor.extend_language("python", "test_patterns.scm")
# Extract symbols with new patternscode = '''def test_user_login(): pass
def regular_function(): pass'''
symbols = TreeSitterSymbolExtractor.extract_symbols(".py", code)for symbol in symbols: print(f"{symbol['type']}: {symbol['name']}")# Output:# test_function: test_user_login# function: test_user_login# function: regular_function
API Reference
Core Methods
extend_language(language: str, query_file: str) -> None
Extend an existing language with additional query patterns.
- language: Language name (e.g., ‘python’, ‘javascript’)
- query_file: Path to .scm query file (absolute or relative to queries directory)
register_language(name: str, extensions: List[str], query_files: List[str], query_dirs: Optional[List[str]] = None) -> None
Register a completely new language.
- name: Language name (should match tree-sitter-language-pack)
- extensions: File extensions (e.g., [‘.kt’, ‘.kts’])
- query_files: List of .scm query files to load
- query_dirs: Optional custom directories to search for queries
list_supported_languages() -> Dict[str, List[str]]
Returns mapping of language names to their supported file extensions.
reset_plugins() -> None
Reset all custom languages and extensions. Useful for testing and cleanup.
Symbol Structure
Extracted symbols have this structure:
{ "name": "function_name", # Symbol name "type": "function", # Symbol type (function, class, method, etc.) "start_line": 0, # Starting line number (0-indexed) "end_line": 5, # Ending line number (0-indexed) "code": "def function_name():\n pass", # Full symbol code "subtype": "optional" # Optional subtype for specialized symbols}
Extending Existing Languages
Python: FastAPI Routes
Detect FastAPI route decorators:
fastapi_query = '''; FastAPI route handlers(decorated_definition (decorator_list (decorator (call (attribute object: (identifier) @app_name attribute: (identifier) @http_method ) arguments: (argument_list (string) @route_path ) ) ) ) definition: (function_definition name: (identifier) @name )) @definition.route_handler'''
TreeSitterSymbolExtractor.extend_language("python", "/path/to/fastapi.scm")
Python: Django Models
Detect Django model fields:
django_query = '''; Django model fields(assignment target: (identifier) @field_name value: (call function: (attribute object: (identifier) @models attribute: (identifier) @field_type ) )) @definition.model_field
; Django Meta classes(class_definition name: (identifier) @name (#match? @name "Meta")) @definition.meta_class'''
TreeSitterSymbolExtractor.extend_language("python", "/path/to/django.scm")
JavaScript: React Components
Detect React functional components:
react_query = '''; React functional components(function_declaration name: (identifier) @name (#match? @name "^[A-Z]") body: (block (return_statement argument: (jsx_element) ) )) @definition.react_component
; React hooks(call_expression function: (identifier) @hook_name (#match? @hook_name "^use[A-Z]")) @definition.hook_usage'''
TreeSitterSymbolExtractor.extend_language("javascript", "/path/to/react.scm")
Registering New Languages
Example: Kotlin Support
# Kotlin query patternskotlin_query = '''; Function declarations(function_declaration name: (identifier) @name) @definition.function
; Class declarations(class_declaration name: (identifier) @name) @definition.class
; Property declarations(property_declaration name: (identifier) @name) @definition.property
; Data classes(class_declaration modifiers: (modifiers (modifier) @data_modifier (#match? @data_modifier "data") ) name: (identifier) @name) @definition.data_class'''
# Register the languageTreeSitterSymbolExtractor.register_language( name="kotlin", extensions=[".kt", ".kts"], query_files=["kotlin.scm"], query_dirs=["/path/to/custom/queries"])
Example: Custom DSL
# Register a custom domain-specific languageTreeSitterSymbolExtractor.register_language( name="my_dsl", extensions=[".mydsl"], query_files=["base.scm", "advanced.scm"], query_dirs=[ "/company/shared/queries", "/project/local/queries" ])
Real-World Examples
Team Coding Standards
Enforce naming conventions across your codebase:
(function_definition name: (identifier) @name (#match? @name "^(get|set|create|update|delete)_")) @definition.crud_function
(class_definition name: (identifier) @name (#match? @name ".*Service$")) @definition.service_class
(class_definition name: (identifier) @name (#match? @name ".*Repository$")) @definition.repository_class
API Documentation Generation
Extract API endpoints for documentation:
(decorated_definition (decorator_list (decorator (call (attribute attribute: (identifier) @http_method (#match? @http_method "(get|post|put|delete|patch)") ) ) ) ) definition: (function_definition name: (identifier) @name )) @definition.api_endpoint
Testing Pattern Detection
Identify test functions and test classes:
(function_definition name: (identifier) @name (#match? @name "^test_")) @definition.test_function
(class_definition name: (identifier) @name (#match? @name "^Test")) @definition.test_class
(function_definition decorators: (decorator_list (decorator (identifier) @decorator (#match? @decorator "pytest.fixture") ) ) name: (identifier) @name) @definition.test_fixture
Best Practices
Query Organization
-
Separate by Purpose: Create different .scm files for different concerns
queries/├── python/│ ├── tags.scm # Base language patterns│ ├── django.scm # Django-specific patterns│ ├── fastapi.scm # FastAPI-specific patterns│ └── testing.scm # Testing patterns -
Use Descriptive Names: Make symbol types self-documenting
# Good@definition.api_endpoint@definition.model_field@definition.test_fixture# Avoid@definition.thing@definition.item -
Comment Your Queries: Explain complex patterns
; Match Django model fields with specific field types; Captures both the field name and field type for analysis(assignmenttarget: (identifier) @field_namevalue: (callfunction: (attributeobject: (identifier) @modelsattribute: (identifier) @field_type))) @definition.model_field
Performance Considerations
-
Use Specific Patterns: More specific queries are faster
; Better - specific pattern(function_definitionname: (identifier) @name(#match? @name "^handle_")) @definition.handler; Slower - overly broad pattern(function_definitionname: (identifier) @name) @definition.function -
Combine Related Patterns: Group similar patterns in one file
-
Test Query Performance: Use logging to monitor query compilation time
Version Control
- Include Query Files: Check .scm files into version control
- Document Extensions: Maintain a README explaining custom queries
- Team Sharing: Use shared query directories for team standards
Troubleshooting
Common Query Errors
-
Invalid Field Name: Field doesn’t exist in grammar
Error: Invalid field name at row 5, column 10: sliceSolution: Check the tree-sitter grammar documentation for valid field names
-
Query Compilation Failed: Syntax error in query
Error: Query compile error for ext .pySolution: Validate query syntax, check parentheses matching
Debugging Tips
-
Enable Debug Logging:
import logginglogging.getLogger('kit.tree_sitter_symbol_extractor').setLevel(logging.DEBUG) -
Test Queries Incrementally: Start with simple patterns and add complexity
-
Check Language Support: Verify the language is available in tree-sitter-language-pack
Reset and Recovery
If you encounter issues with cached queries:
# Reset all plugins and start freshTreeSitterSymbolExtractor.reset_plugins()
# Re-register your extensionsTreeSitterSymbolExtractor.extend_language("python", "your_query.scm")
Advanced Usage
Multiple Query Directories
Load queries from multiple locations with fallback priority:
TreeSitterSymbolExtractor.register_language( name="python", extensions=[".py"], query_files=["base.scm", "company.scm", "project.scm"], query_dirs=[ "/project/queries", # Highest priority "/company/shared/queries", # Medium priority "/home/user/.kit/queries" # Lowest priority ])
Dynamic Query Loading
Load queries based on project configuration:
import yaml
def load_project_queries(): with open('.kit-config.yml') as f: config = yaml.safe_load(f)
for lang_config in config.get('languages', []): TreeSitterSymbolExtractor.extend_language( language=lang_config['name'], query_file=lang_config['query_file'] )
# Usage in project setupload_project_queries()
Integration with CI/CD
Use plugins to enforce coding standards:
def check_naming_conventions(file_path: str) -> List[str]: violations = []
with open(file_path) as f: code = f.read()
symbols = TreeSitterSymbolExtractor.extract_symbols( file_path.suffix, code )
for symbol in symbols: if symbol['type'] == 'function' and not symbol['name'].startswith(('get_', 'set_', 'create_')): violations.append(f"Function {symbol['name']} doesn't follow naming convention")
return violations
This plugin system makes Kit’s symbol extraction completely customizable while maintaining excellent performance and backward compatibility. You can now adapt Kit to work with any codebase’s specific patterns and conventions!