Skip to content

Search & Discovery Overview

Kit provides multiple ways to search and discover code, each optimized for different use cases. This guide helps you choose the right approach.

Quick Decision Guide

I know the exact string I’m looking for → Text Search

I want to find functions/classes by name → Symbol Search

I’m searching by meaning, not keywords → Semantic Search

I want to understand what code does → Docstring Search

Search Methods Compared

Text Search (Regex)

What it is: Fast regex-based text matching across files

Best for:

  • Finding exact strings or patterns
  • Security audits (finding “password”, “API_KEY”)
  • TODO/FIXME tracking
  • Finding imports or specific syntax

Speed: ⚡⚡⚡ Very fast

Setup: None required

Example:

repo.search_text("TODO", file_pattern="**/*.py")
repo.search_text(r"def \w+_handler", file_pattern="*.py")

Full Text Search Guide

Symbol Search (Tree-sitter)

What it is: Language-aware extraction of functions, classes, methods

Best for:

  • Finding code by name (functions, classes)
  • Building navigation tools
  • Analyzing code structure
  • Finding large/complex functions

Speed: ⚡⚡⚡ Very fast (cached)

Setup: None required

Example:

symbols = repo.extract_symbols()
functions = [s for s in symbols if s["type"] == "function"]
auth_funcs = [s for s in symbols if "auth" in s["name"]]

Full Symbol Search Guide

Semantic Search (Embeddings)

What it is: Vector search using code embeddings

Best for:

  • Finding code by meaning, not exact keywords
  • “How is authentication handled?”
  • “Where do we validate user input?”
  • Queries where you don’t know the exact variable names

Speed: ⚡⚡ Medium (requires indexing)

Setup: Requires embedding model and index building

Example:

vs = repo.get_vector_searcher(embed_fn=my_embed_fn)
vs.build_index()
results = repo.search_semantic("authentication logic", embed_fn=my_embed_fn)

Full Semantic Search Guide

Docstring Search (LLM + Embeddings)

What it is: Search LLM-generated summaries of code

Best for:

  • Understanding what code does at a high level
  • Finding code by intent (“retry with exponential backoff”)
  • Queries about purpose rather than implementation
  • When code lacks comments

Speed: ⚡ Slower (LLM summarization required)

Setup: Requires LLM API key, embedding model, and index building

Example:

summarizer = repo.get_summarizer()
indexer = DocstringIndexer(repo, summarizer)
indexer.build()
searcher = indexer.get_searcher()
results = searcher.search("retry logic with backoff")

Full Docstring Search Guide

Combining Approaches

The real power comes from combining methods:

# Find all functions with "validate" in the name
symbols = repo.extract_symbols()
validate_funcs = [s for s in symbols if "validate" in s["name"].lower()]
# Then search for their usage
for func in validate_funcs:
usages = repo.search_text(f"{func['name']}\\(", file_pattern="**/*.py")
print(f"{func['name']}: {len(usages)} usages")
# Use semantic search to find relevant code
semantic_results = repo.search_semantic("user authentication", embed_fn=embed_fn, top_k=10)
# Extract symbols from those files
auth_symbols = []
for result in semantic_results:
file_symbols = repo.extract_symbols(result["file"])
auth_symbols.extend(file_symbols)
# Now you have structured symbol data from semantically relevant files

Text Search → Context Assembly

# Find candidate files with text search
matches = repo.search_text("database connection", file_pattern="**/*.py")
# Get full file content for matches
from kit import ContextAssembler
assembler = ContextAssembler(max_chars=10000)
files = list(set(m["file"] for m in matches))
context = assembler.from_files(files)
# Feed to LLM
# llm_response = my_llm.chat(f"Explain the database connection: {context}")

Performance Characteristics

MethodSpeedSetupIndex RequiredBest Query Type
Text SearchVery FastNoneNoExact strings, patterns
Symbol SearchVery FastNoneNo (cached)Names, structure
Semantic SearchMediumEmbed modelYesNatural language
Docstring SearchSlowLLM + embedYesIntent, purpose

Common Workflows

Code Review Preparation

# 1. Find changed files (external tool)
changed_files = ["src/api/auth.py", "src/models/user.py"]
# 2. Extract symbols to see what changed
for file in changed_files:
symbols = repo.extract_symbols(file)
print(f"\n{file}:")
for s in symbols:
print(f" {s['type']}: {s['name']}")
# 3. Search for usage of changed symbols
for file in changed_files:
symbols = repo.extract_symbols(file)
for s in symbols:
usages = repo.search_text(s["name"], file_pattern="**/*.py")
if len(usages) > 1: # More than just the definition
print(f"{s['name']} used in {len(usages)} places")

Codebase Exploration

# 1. Get overview with symbols
symbols = repo.extract_symbols()
print(f"Total: {len(symbols)} symbols")
# 2. Find interesting patterns with text search
tests = repo.search_text(r"def test_", file_pattern="tests/**/*.py")
print(f"Tests: {len(tests)}")
# 3. Use semantic search to understand specific areas
results = repo.search_semantic("how is caching implemented", embed_fn=embed_fn)
print("Caching-related code:")
for r in results[:5]:
print(f" {r['file']}")

See Also