Build a Docstring Search Engine
In this tutorial you’ll build a semantic search tool on top of kit
using docstring-based indexing.
Why docstrings? Summaries distill intent rather than syntax. Embedding these
short natural-language strings lets the vector DB focus on meaning, giving you
relevant hits even when the literal code differs (e.g., retry()
vs
attempt_again()
). It also keeps the index small (one embedding per file or
symbol instead of dozens of raw-code chunks).
1. Install dependencies
Section titled “1. Install dependencies”uv pip install kit[openai] sentence-transformers chromadb
2. Initialise a repo and summarizer
Section titled “2. Initialise a repo and summarizer”import kitfrom kit import Repository, DocstringIndexer, Summarizer, SummarySearcherfrom sentence_transformers import SentenceTransformer
REPO_PATH = "/path/to/your/project"repo = Repository(REPO_PATH)
summarizer = repo.get_summarizer() # defaults to OpenAIConfig
3. Build the docstring index
Section titled “3. Build the docstring index”embed_model = SentenceTransformer("all-MiniLM-L6-v2")embed_fn = lambda txt: embed_model.encode(txt).tolist()
indexer = DocstringIndexer(repo, summarizer, embed_fn)indexer.build() # writes REPO_PATH/.kit_cache/docstring_db
The first run will take time depending on repo size and LLM latency. Summaries are cached inside the vector DB (and in a meta.json within the persist_dir), so subsequent runs are cheap if code hasn’t changed.
4. Query the index
Section titled “4. Query the index”searcher = indexer.get_searcher()
results = searcher.search("How is the retry back-off implemented?", top_k=3)for hit in results: print(f"→ File: {hit.get('file_path', 'N/A')}\n Summary: {hit.get('summary', 'N/A')}")
You now have a semantic code searcher, using powerful docstring summaries, as easy as that.