DocstringIndexer API
The DocstringIndexer class is responsible for building a vector index of AI-generated code summaries (docstrings). It processes files in a repository, generates summaries for code symbols (or entire files), embeds these summaries, and stores them in a configurable vector database backend. Once an index is built, it can be queried using the SummarySearcher class.
Constructor
Class: DocstringIndexer
(defined in kit/docstring_indexer.py)
from kit import Repository, Summarizerfrom kit.docstring_indexer import DocstringIndexer, EmbedFn # EmbedFn is Optional[Callable[[str], List[float]]]from kit.vector_searcher import VectorDBBackend # Optional
# Example basic initializationrepo = Repository("/path/to/your/repo")summarizer = Summarizer() # Assumes OPENAI_API_KEY is set or local model configuredindexer = DocstringIndexer(repo=repo, summarizer=summarizer)
# Example with custom embedding function and backend# def my_custom_embed_fn(text: str) -> List[float]:# # ... your embedding logic ...# return [0.1, 0.2, ...]## from kit.vector_searcher import ChromaDBBackend# custom_backend = ChromaDBBackend(collection_name="my_custom_index", persist_dir="./my_chroma_db")## indexer_custom = DocstringIndexer(# repo=repo,# summarizer=summarizer,# embed_fn=my_custom_embed_fn,# backend=custom_backend,# persist_dir="./my_custom_index_explicit_persist" # Can also be set directly on backend# )Parameters:
repo(Repository, required): An instance ofkit.Repositorypointing to the codebase to be indexed.summarizer(Summarizer, required): An instance ofkit.Summarizerused to generate summaries for code symbols or files.embed_fn(Optional[Callable[[str], List[float]]], default:SentenceTransformer('all-MiniLM-L6-v2')): A function that takes a string and returns its embedding (a list of floats). IfNone, a default embedding function usingsentence-transformers(all-MiniLM-L6-v2model) will be used. Thesentence-transformerspackage must be installed for the default to work (pip install sentence-transformers).backend(Optional[VectorDBBackend], default:ChromaDBBackend): The vector database backend to use for storing and querying embeddings. IfNone, aChromaDBBackendinstance will be created. The default collection name iskit_docstring_index.persist_dir(Optional[str], default:'./.kit_index/' + repo_name_slug + '/docstrings'): The directory where the vector database (e.g., ChromaDB) should persist its data. IfNone, a default path is constructed based on the repository name within a.kit_indexdirectory in the current working directory. If a custombackendis provided, this parameter might be ignored if the backend itself has persistence configured. It’s primarily used for the defaultChromaDBBackendif no explicitbackendis given or if the default backend needs a specific persistence path.
Methods
build
Method: DocstringIndexer.build
(defined in kit/docstring_indexer.py)
Builds or rebuilds the docstring index. It iterates through files in the repository (respecting .gitignore and file_extensions), extracts symbols or uses whole file content based on the level, generates summaries, embeds them, and adds them to the vector database. It also handles caching to avoid re-processing unchanged symbols/files.
# Build the index (symbol-level by default for .py files)indexer.build()
# Force a rebuild, ignoring any existing cacheindexer.build(force=True)
# Index at file level instead of symbol levelindexer.build(level="file")
# Index only specific file extensionsindexer.build(file_extensions=[".py", ".mdx"])Parameters:
force(bool, default:False): IfTrue, the entire index is rebuilt, ignoring any existing cache and potentially overwriting existing data in the backend. IfFalse, uses cached summaries/embeddings for unchanged code and only processes new/modified code. It also avoids re-initializing the backend if it already contains data, unless changes are detected.level(str, default:'symbol'): The granularity of indexing.'symbol': Extracts and summarizes individual symbols (functions, classes, methods) from files.'file': Summarizes the entire content of each file.
file_extensions(Optional[List[str]], default:None(uses Repository’s default, typically .py)): A list of file extensions (e.g.,['.py', '.md']) to include in the indexing process. IfNone, uses the default behavior of theRepositoryinstance, which typically focuses on Python files but can be configured.
Returns: None
get_searcher
Method: DocstringIndexer.get_searcher
(defined in kit/docstring_indexer.py)
Returns a SummarySearcher instance that is configured to query the index managed by this DocstringIndexer.
This provides a convenient way to obtain a search interface after the indexer has been built or loaded, without needing to manually instantiate SummarySearcher.
# Assuming 'indexer' is an initialized DocstringIndexer instance# indexer.build() # or it has been loaded with a pre-built index
search_interface = indexer.get_searcher()results = search_interface.search("my search query", top_k=3)
for result in results: print(result)Parameters: None
Returns: SummarySearcher
An instance of SummarySearcher linked to this indexer.