Search & Discovery Overview
Kit provides multiple ways to search and discover code, each optimized for different use cases. This guide helps you choose the right approach.
Quick Decision Guide
I know the exact string I’m looking for → Text Search
I want to find functions/classes by name → Symbol Search
I’m searching by meaning, not keywords → Semantic Search
I want to understand what code does → Docstring Search
Search Methods Compared
Text Search (Regex)
What it is: Fast regex-based text matching across files
Best for:
- Finding exact strings or patterns
- Security audits (finding “password”, “API_KEY”)
- TODO/FIXME tracking
- Finding imports or specific syntax
Speed: ⚡⚡⚡ Very fast
Setup: None required
Example:
repo.search_text("TODO", file_pattern="**/*.py")repo.search_text(r"def \w+_handler", file_pattern="*.py")Symbol Search (Tree-sitter)
What it is: Language-aware extraction of functions, classes, methods
Best for:
- Finding code by name (functions, classes)
- Building navigation tools
- Analyzing code structure
- Finding large/complex functions
Speed: ⚡⚡⚡ Very fast (cached)
Setup: None required
Example:
symbols = repo.extract_symbols()functions = [s for s in symbols if s["type"] == "function"]auth_funcs = [s for s in symbols if "auth" in s["name"]]Semantic Search (Embeddings)
What it is: Vector search using code embeddings
Best for:
- Finding code by meaning, not exact keywords
- “How is authentication handled?”
- “Where do we validate user input?”
- Queries where you don’t know the exact variable names
Speed: ⚡⚡ Medium (requires indexing)
Setup: Requires embedding model and index building
Example:
vs = repo.get_vector_searcher(embed_fn=my_embed_fn)vs.build_index()results = repo.search_semantic("authentication logic", embed_fn=my_embed_fn)Docstring Search (LLM + Embeddings)
What it is: Search LLM-generated summaries of code
Best for:
- Understanding what code does at a high level
- Finding code by intent (“retry with exponential backoff”)
- Queries about purpose rather than implementation
- When code lacks comments
Speed: ⚡ Slower (LLM summarization required)
Setup: Requires LLM API key, embedding model, and index building
Example:
summarizer = repo.get_summarizer()indexer = DocstringIndexer(repo, summarizer)indexer.build()searcher = indexer.get_searcher()results = searcher.search("retry logic with backoff")Combining Approaches
The real power comes from combining methods:
Text + Symbol Search
# Find all functions with "validate" in the namesymbols = repo.extract_symbols()validate_funcs = [s for s in symbols if "validate" in s["name"].lower()]
# Then search for their usagefor func in validate_funcs: usages = repo.search_text(f"{func['name']}\\(", file_pattern="**/*.py") print(f"{func['name']}: {len(usages)} usages")Semantic + Symbol Search
# Use semantic search to find relevant codesemantic_results = repo.search_semantic("user authentication", embed_fn=embed_fn, top_k=10)
# Extract symbols from those filesauth_symbols = []for result in semantic_results: file_symbols = repo.extract_symbols(result["file"]) auth_symbols.extend(file_symbols)
# Now you have structured symbol data from semantically relevant filesText Search → Context Assembly
# Find candidate files with text searchmatches = repo.search_text("database connection", file_pattern="**/*.py")
# Get full file content for matchesfrom kit import ContextAssemblerassembler = ContextAssembler(max_chars=10000)
files = list(set(m["file"] for m in matches))context = assembler.from_files(files)
# Feed to LLM# llm_response = my_llm.chat(f"Explain the database connection: {context}")Performance Characteristics
| Method | Speed | Setup | Index Required | Best Query Type |
|---|---|---|---|---|
| Text Search | Very Fast | None | No | Exact strings, patterns |
| Symbol Search | Very Fast | None | No (cached) | Names, structure |
| Semantic Search | Medium | Embed model | Yes | Natural language |
| Docstring Search | Slow | LLM + embed | Yes | Intent, purpose |
Common Workflows
Code Review Preparation
# 1. Find changed files (external tool)changed_files = ["src/api/auth.py", "src/models/user.py"]
# 2. Extract symbols to see what changedfor file in changed_files: symbols = repo.extract_symbols(file) print(f"\n{file}:") for s in symbols: print(f" {s['type']}: {s['name']}")
# 3. Search for usage of changed symbolsfor file in changed_files: symbols = repo.extract_symbols(file) for s in symbols: usages = repo.search_text(s["name"], file_pattern="**/*.py") if len(usages) > 1: # More than just the definition print(f"{s['name']} used in {len(usages)} places")Codebase Exploration
# 1. Get overview with symbolssymbols = repo.extract_symbols()print(f"Total: {len(symbols)} symbols")
# 2. Find interesting patterns with text searchtests = repo.search_text(r"def test_", file_pattern="tests/**/*.py")print(f"Tests: {len(tests)}")
# 3. Use semantic search to understand specific areasresults = repo.search_semantic("how is caching implemented", embed_fn=embed_fn)print("Caching-related code:")for r in results[:5]: print(f" {r['file']}")See Also
- Text Search - Regex patterns and examples
- Symbol Search - Structural code discovery
- Semantic Search - Meaning-based search setup
- Docstring Indexing - LLM-powered summaries