Repository API
This page details the methods and properties of the central kit.Repository
class.
Initialization
Section titled “Initialization”from kit import Repository
# Initialize with a local pathlocal_repo = Repository("/path/to/your/local/project")
# Initialize with a remote URL (requires git)# remote_repo = Repository("https://github.com/user/repo.git")
Core Methods
Section titled “Core Methods”index()
: Analyzes the repository.get_file_tree()
: Returns the file tree structure.get_file_content(file_path)
: Reads and returns the content of a specified file.extract_symbols(path)
: Extracts code symbols.semantic_search(query)
: Performs semantic search.get_summarizer()
: Gets the code summarizer.
Creating a Repository
Instance
Section titled “Creating a Repository Instance”To start using kit
, first create an instance of the Repository
class. This points kit
to the codebase you want to analyze.
from kit import Repository
# For a local directoryrepository_instance = Repository(path_or_url="/path/to/local/project")
# For a remote Git repository (public or private)# repository_instance = Repository(# path_or_url="https://github.com/owner/repo-name",# github_token="YOUR_GITHUB_TOKEN", # Optional: For private repos# cache_dir="/path/to/cache" # Optional: For caching clones# )
Parameters:
path_or_url
(str): The path to a local directory or the URL of a remote Git repository.github_token
(Optional[str]): A GitHub personal access token required for cloning private repositories. Defaults toNone
.cache_dir
(Optional[str]): Path to a directory for caching cloned repositories. Defaults to a system temporary directory.
Once you have a repository
object, you can call the following methods on it:
repository.get_file_tree()
Section titled “repository.get_file_tree()”Returns the file tree structure of the repository.
repository.get_file_tree() -> List[Dict[str, Any]]
Returns:
List[Dict[str, Any]]
: A list of dictionaries, where each dictionary represents a file or directory with keys likepath
,name
,is_dir
,size
.
repository.get_file_content()
Section titled “repository.get_file_content()”Reads and returns the content of a specified file within the repository as a string.
repository.get_file_content(file_path: str) -> str
Parameters:
file_path
(str): The path to the file, relative to the repository root.
Returns:
str
: The content of the file.
Raises:
FileNotFoundError
: If the file does not exist at the specified path.IOError
: If any other I/O error occurs during file reading.
repository.extract_symbols()
Section titled “repository.extract_symbols()”Extracts code symbols (functions, classes, variables, etc.) from the repository.
repository.extract_symbols(file_path: Optional[str] = None) -> List[Dict[str, Any]]
Parameters:
file_path
(Optional[str]): If provided, extracts symbols only from this specific file path relative to the repo root. IfNone
, extracts symbols from all supported files in the repository. Defaults toNone
.
Returns:
List[Dict[str, Any]]
: A list of dictionaries, each representing a symbol with keys likename
,type
,file
,line_start
,line_end
,code_snippet
.
repository.search_text()
Section titled “repository.search_text()”Searches for literal text or regex patterns within files.
repository.search_text(query: str, file_pattern: str = "*.py") -> List[Dict[str, Any]]
Parameters:
query
(str): The text or regex pattern to search for.file_pattern
(str): A glob pattern to filter files to search within. Defaults to"*.py"
.
Returns:
List[Dict[str, Any]]
: A list of dictionaries representing search matches, with keys likefile
,line_number
,line_content
.
repository.chunk_file_by_lines()
Section titled “repository.chunk_file_by_lines()”Chunks a file’s content based on line count.
repository.chunk_file_by_lines(file_path: str, max_lines: int = 50) -> List[str]
Parameters:
file_path
(str): The path to the file (relative to repo root) to chunk.max_lines
(int): The maximum number of lines per chunk. Defaults to50
.
Returns:
List[str]
: A list of strings, where each string is a chunk of the file content.
repository.chunk_file_by_symbols()
Section titled “repository.chunk_file_by_symbols()”Chunks a file’s content based on its top-level symbols (functions, classes).
repository.chunk_file_by_symbols(file_path: str) -> List[Dict[str, Any]]
Parameters:
file_path
(str): The path to the file (relative to repo root) to chunk.
Returns:
List[Dict[str, Any]]
: A list of dictionaries, each representing a symbol chunk with keys likename
,type
,code_snippet
.
repository.extract_context_around_line()
Section titled “repository.extract_context_around_line()”Extracts the surrounding code context (typically the containing function or class) for a specific line number.
repository.extract_context_around_line(file_path: str, line: int) -> Optional[Dict[str, Any]]
Parameters:
file_path
(str): The path to the file (relative to repo root).line
(int): The (0-based) line number to find context for.
Returns:
Optional[Dict[str, Any]]
: A dictionary representing the symbol context (with keys likename
,type
,code_snippet
), orNone
if no context is found.
repository.index()
Section titled “repository.index()”Builds and returns a comprehensive index of the repository, including both the file tree and all extracted symbols.
repository.index() -> Dict[str, Any]
Returns:
Dict[str, Any]
: A dictionary containing the full index, typically with keys likefile_tree
andsymbols
.
repository.get_vector_searcher()
Section titled “repository.get_vector_searcher()”Initializes and returns the VectorSearcher
instance for performing semantic search.
repository.get_vector_searcher(embed_fn=None, backend=None, persist_dir=None) -> VectorSearcher
Parameters:
embed_fn
(Callable): Required on first call. A function that takes a list of strings and returns a list of embedding vectors.backend
(Optional[Any]): Specifies the vector database backend. IfNone
,kit
defaults to usingChromaDBBackend
.persist_dir
(Optional[str]): Path to a directory to persist the vector index. IfNone
, theVectorSearcher
will default toYOUR_REPO_PATH/.kit/vector_db/
for ChromaDB. Setting toNone
implies using this default persistence path for ChromaDB.
Returns:
VectorSearcher
: An instance of the vector searcher configured for this repository.
(See Configuring Semantic Search for more details.)
repository.search_semantic()
Section titled “repository.search_semantic()”Performs a semantic search query over the indexed codebase.
repository.search_semantic(query: str, top_k: int = 5, embed_fn=None) -> List[Dict[str, Any]]
Parameters:
query
(str): The natural language query to search for.top_k
(int): The maximum number of results to return. Defaults to5
.embed_fn
(Callable): Required if the vector searcher hasn’t been initialized yet.
Returns:
List[Dict[str, Any]]
: A list of dictionaries representing the search results, typically including matched code snippets and relevance scores.
repository.find_symbol_usages()
Section titled “repository.find_symbol_usages()”Finds definitions and references of a specific symbol across the repository.
repository.find_symbol_usages(symbol_name: str, symbol_type: Optional[str] = None) -> List[Dict[str, Any]]
Parameters:
symbol_name
(str): The name of the symbol to find usages for.symbol_type
(Optional[str]): Optionally restrict the search to a specific symbol type (e.g., ‘function’, ‘class’). Defaults toNone
(search all types).
Returns:
List[Dict[str, Any]]
: A list of dictionaries representing symbol usages, including file, line number, and context/snippet.
repository.write_index()
Section titled “repository.write_index()”Writes the full repository index (file tree and symbols) to a JSON file.
repository.write_index(file_path: str) -> None
Parameters:
file_path
(str): The path to the output JSON file.
repository.write_symbols()
Section titled “repository.write_symbols()”Writes extracted symbols to a JSON file.
repository.write_symbols(file_path: str, symbols: Optional[list] = None) -> None
Parameters:
file_path
(str): The path to the output JSON file.symbols
(Optional[list]): An optional list of symbol dictionaries to write. IfNone
, writes all symbols extracted from the repository. Defaults toNone
.
repository.write_file_tree()
Section titled “repository.write_file_tree()”Writes the repository file tree to a JSON file.
repository.write_file_tree(file_path: str) -> None
Parameters:
file_path
(str): The path to the output JSON file.
repository.write_symbol_usages()
Section titled “repository.write_symbol_usages()”Writes the found usages of a specific symbol to a JSON file.
repository.write_symbol_usages(symbol_name: str, file_path: str, symbol_type: Optional[str] = None) -> None
Parameters:
symbol_name
(str): The name of the symbol whose usages were found.file_path
(str): The path to the output JSON file.symbol_type
(Optional[str]): The symbol type filter used when finding usages. Defaults toNone
.
repository.get_context_assembler()
Section titled “repository.get_context_assembler()”Convenience helper that returns a fresh ContextAssembler
bound to this repository.
Use it instead of importing the class directly:
assembler = repository.get_context_assembler()assembler.add_diff(my_diff)context_blob = assembler.format_context()
Returns:
ContextAssembler
: Ready-to-use assembler instance.