Usage Guide
This guide provides practical examples of how to use the core Repository
object in kit
to interact with your codebase.
Initializing a Repository
Section titled “Initializing a Repository”First, create an instance of the Repository
class, pointing it to your code. kit
can work with local directories or clone remote Git repositories.
This is the starting point for any analysis, giving kit
access to the codebase.
Local Directory
Section titled “Local Directory”If your code is already on your machine:
from kit import Repository
repo = Repository("/path/to/your/local/project")
Remote Git Repository
Section titled “Remote Git Repository”kit
can clone a public or private Git repository. For private repos, provide a GitHub token.
# Public reporepo = Repository("https://github.com/owner/repo-name")
# Private repo (requires token)# Ensure the token has appropriate permissionsgithub_token = "your_github_pat_here"repo = Repository("https://github.com/owner/private-repo-name", github_token=github_token)
Caching
Section titled “Caching”When cloning remote repositories, kit
caches them locally to speed up subsequent initializations. By default, caches are stored in a temporary directory. You can specify a persistent cache directory:
repo = Repository( "https://github.com/owner/repo-name", cache_dir="/path/to/persistent/cache")
Basic Exploration
Section titled “Basic Exploration”Once initialized, you can explore the codebase. Use these methods to get a high-level overview of the repository’s structure and key code elements, or to gather foundational context for an LLM.
Getting the File Tree
Section titled “Getting the File Tree”List all files and directories:
file_tree = repo.get_file_tree()# Returns a list of dicts: [{'path': '...', 'is_dir': False, ...}, ...]
Extracting Symbols
Section titled “Extracting Symbols”Identify functions, classes, etc., across the whole repo or in a specific file:
# All symbolsall_symbols = repo.extract_symbols()
# Symbols in a specific filespecific_symbols = repo.extract_symbols("src/my_module.py")# Returns a list of dicts: [{'name': '...', 'type': 'function', ...}, ...]
Searching Text
Section titled “Searching Text”Perform simple text or regex searches:
matches = repo.search_text("my_function_call", file_pattern="*.py")# Returns a list of dicts: [{'file': '...', 'line_number': 10, ...}, ...]
Preparing Code for LLMs
Section titled “Preparing Code for LLMs”kit
provides utilities to prepare code snippets for large language models.
These methods help break down large codebases into manageable pieces suitable for LLM context windows or specific analysis tasks.
Chunking
Section titled “Chunking”Split files into manageable chunks, either by line count or by symbol definition:
# Chunk by linesline_chunks = repo.chunk_file_by_lines("src/long_file.py", max_lines=100)
# Chunk by symbols (functions, classes)symbol_chunks = repo.chunk_file_by_symbols("src/long_file.py")
Extracting Context
Section titled “Extracting Context”Get the specific function or class definition surrounding a given line number:
context = repo.extract_context_around_line("src/my_module.py", line=42)# Returns a dict like {'name': 'my_function', 'type': 'function', 'code': 'def my_function(...): ...'}
Generating Code Summaries (Alpha)
Section titled “Generating Code Summaries (Alpha)”kit
includes an alpha feature for generating natural language summaries (like dynamic docstrings) for code elements (files, functions, classes) using a configured Large Language Model (LLM). This can be useful for:
- Quickly understanding the purpose of a piece of code.
- Providing context to other LLM-powered tools.
- Powering semantic search based on generated summaries rather than just raw code.
Note: This feature is currently in alpha. The API may change, and it requires an LLM (e.g., via OpenAI, Anthropic) to be configured for kit
to use for summarization.
Using the DocstringIndexer
Section titled “Using the DocstringIndexer”The DocstringIndexer
is responsible for managing the summarization process and storing/retrieving these generated “docstrings.”
from kit import Repositoryfrom kit.docstring_indexer import DocstringIndexerfrom kit.summaries import Summarizer, OpenAIConfigfrom sentence_transformers import SentenceTransformer # or any embedder of your choice
# 1. Initialize your Repositoryrepo = Repository("tests/fixtures/realistic_repo") # Or your project path
# 2. Configure the LLM-powered summarizer# Make sure the relevant API key (e.g., OPENAI_API_KEY) is set in your environmentsummarizer = Summarizer(repo, OpenAIConfig(model="gpt-4o"))
# 3. Provide an embedding function (str -> list[float]) for the vector indexst_model = SentenceTransformer("all-MiniLM-L6-v2")embed_fn = lambda text: st_model.encode(text).tolist()
# 4. Create the DocstringIndexer# You can specify where on disk to persist the vector DB via `persist_dir`.indexer = DocstringIndexer( repo, summarizer, embed_fn, persist_dir="kit_docstring_cache",)
# 5. Build the index (generates summaries for new/changed files/symbols)# This may take some time depending on repository size and LLM speed.indexer.build(force=True) # `level="symbol"` by default
# 6. Retrieve a summary – use the built-in SummarySearchersearcher = indexer.get_searcher()hits = searcher.search("utils.greet", top_k=1) # Search by symbol or natural languageif hits: print("Summary:", hits[0]["summary"])else: print("No summary found (yet).")
This generated summary can then be used for various purposes, including enhancing semantic search or providing contextual information for code generation tasks. Refer to the Core Concepts: Docstring Indexing page for more details on configuration and advanced usage.
Semantic Code Search
Section titled “Semantic Code Search”Perform vector-based semantic search (requires configuration). Go beyond keyword search to find code related by meaning or concept, useful for discovery and understanding.
# NOTE: Requires prior setup - see Core Concepts > Configuring Semantic Searchresults = repo.search_semantic("find code related to database connections", top_k=3)
(See Configuring Semantic Search for setup details.)
Finding Symbol Usages
Section titled “Finding Symbol Usages”Locate all definitions and references of a specific symbol: Track down where functions or classes are defined and used throughout the codebase for impact analysis or refactoring.
usages = repo.find_symbol_usages("MyClass", symbol_type="class")# Returns a list of dicts showing definitions and text matches across the repo.
Exporting Data
Section titled “Exporting Data”kit
can export the gathered information (file tree, symbols, index, usages) to JSON files for use in other tools or offline analysis.
Persist the results of your analysis or integrate kit
’s findings into other development workflows.
# Export the full index (files + symbols)repo.write_index("repo_index.json")
# Export only symbolsrepo.write_symbols("symbols.json")
# Export file treerepo.write_file_tree("file_tree.json")
# Export usages of a symbolrepo.write_symbol_usages("MyClass", "my_class_usages.json", symbol_type="class")