MultiRepo API
MultiRepo API
The MultiRepo class provides a unified interface for analyzing multiple repositories together. Search, extract symbols, and audit dependencies across your entire codebase ecosystem as if it were a single repository.
Use Cases
Microservices Architecture
When your application is split across multiple services, MultiRepo lets you search and analyze them together:
from kit import MultiRepo
services = MultiRepo([ "~/code/user-service", "~/code/order-service", "~/code/payment-service", "~/code/notification-service",])
# Find all authentication-related code across servicesauth_code = services.search("authenticate")
# Find where UserModel is defined across servicesuser_models = services.find_symbol("UserModel", symbol_type="class")Frontend + Backend Projects
Analyze your full-stack application across language boundaries:
repos = MultiRepo([ "~/code/web-app", # React/TypeScript "~/code/api-server", # Python/FastAPI "~/code/shared-types", # Shared schemas])
# Search for API endpoint patterns across stackendpoints = repos.search("handleAuth")
# Audit all dependenciesdeps = repos.audit_dependencies()# Returns: {"web-app": {"javascript": {"react": "^18.0"}},# "api-server": {"python": {"fastapi": "0.100.0"}}, ...}Team Repositories
Find patterns and inconsistencies across team projects:
team_repos = MultiRepo([ "~/code/project-alpha", "~/code/project-beta", "~/code/project-gamma",])
# Find all TODO comments across projectstodos = team_repos.search("TODO")
# Get summary of all repositoriessummaries = team_repos.summarize()for name, info in summaries.items(): print(f"{name}: {info['file_count']} files, {list(info['languages'].keys())}")Semantic Search Across Repos
Find code by meaning, not just keywords—across all your repositories:
from sentence_transformers import SentenceTransformer
repos = MultiRepo([ "~/code/auth-service", "~/code/user-service", "~/code/api-gateway",])
# Build indexes oncemodel = SentenceTransformer("all-MiniLM-L6-v2")repos.index_all(embed_fn=model.encode)
# Search by concept - finds relevant code even with different naming conventionsresults = repos.search_semantic("user authentication and session management", top_k=10)
# Results ranked by relevance across ALL reposfor r in results: print(f"[{r['repo']}] {r['file']} (score: {r['score']:.2f})")
# More examples:repos.search_semantic("error handling and retry logic")repos.search_semantic("database connection pooling")repos.search_semantic("rate limiting implementation")This is especially useful when:
- Different teams use different naming conventions
- You’re exploring an unfamiliar codebase ecosystem
- You need to find similar patterns implemented differently across services
Dependency Auditing
Spot version drift across your services - a common source of subtle bugs in microservices:
repos = MultiRepo(["~/code/service-a", "~/code/service-b", "~/code/service-c"])audit = repos.audit_dependencies()
# Detect version drift: find packages with different versions across reposfrom collections import defaultdict
packages = defaultdict(dict) # {package: {repo: version}}for repo, deps in audit.items(): for lang, pkgs in deps.items(): for pkg, version in pkgs.items(): packages[f"{lang}:{pkg}"][repo] = version
# Show packages with version mismatchesfor pkg, versions in packages.items(): unique_versions = set(versions.values()) if len(unique_versions) > 1: # Different versions detected! print(f"⚠️ {pkg} has version drift:") for repo, ver in versions.items(): print(f" {repo}: {ver}")
# Output:# ⚠️ python:requests has version drift:# service-a: 2.28.0# service-b: 2.31.0# ⚠️ javascript:axios has version drift:# frontend: ^1.4.0# admin-ui: ^0.27.0Quick Start
from kit import MultiRepo
# Initialize with repository paths (local or remote)repos = MultiRepo([ "/path/to/repo-a", "~/code/repo-b", # Tilde expansion "https://github.com/owner/repo-c", # Remote URL "https://github.com/owner/repo-d.git", # With .git suffix])
# Access individual reposprint(repos.names) # ['repo-a', 'repo-b', 'repo-c', 'repo-d']single_repo = repos["repo-a"] # Returns Repository instance
# Iterate over all reposfor name, repo in repos: print(f"{name}: {len(repo.get_file_tree())} files")Initialization
MultiRepo( paths: Sequence[str | Path], names: Optional[Dict[str, str]] = None,)Parameters:
paths: List of paths to repositories - supports:- Local paths:
/path/to/repo - Tilde expansion:
~/code/repo - Remote URLs:
https://github.com/owner/repo
- Local paths:
names: Optional dict mapping path strings to friendly names. If not provided, directory/repo names are extracted automatically
Name Collision Handling:
When multiple repositories have the same directory name, MultiRepo automatically appends a counter:
repos = MultiRepo([ "~/org-a/utils", # Named "utils" "~/org-b/utils", # Named "utils_1" "~/org-c/utils", # Named "utils_2"])Methods
search(query, file_pattern="*", max_results_per_repo=None)
Search for text/regex across all repositories.
# Basic searchresults = repos.search("handleAuth")
# Search only Python filesresults = repos.search("TODO", file_pattern="*.py")
# Limit results per reporesults = repos.search("import", max_results_per_repo=10)Returns: List of matches with repo attribution:
[ {"repo": "frontend", "file": "src/auth.js", "line_number": 42, "line_content": "..."}, {"repo": "backend", "file": "auth/views.py", "line_number": 15, "line_content": "..."},]search_semantic(query, top_k=10, top_k_per_repo=None, embed_fn=None)
Semantic search across all repositories, returning results ranked by relevance.
# Search by meaning, not just keywordsresults = repos.search_semantic("error handling patterns", top_k=20)
# Custom embedding functionresults = repos.search_semantic("authentication flow", embed_fn=my_embed_fn)Returns: Results sorted by relevance score with repo attribution.
find_symbol(symbol_name, symbol_type=None)
Find a specific symbol across all repositories.
# Find function by namefuncs = repos.find_symbol("process_data", symbol_type="function")
# Find class by nameclasses = repos.find_symbol("UserModel", symbol_type="class")
# Find any symbol with that nameall_matches = repos.find_symbol("Config")Returns:
[ {"repo": "service-a", "name": "process_data", "type": "function", "file": "main.py", "line": 10}, {"repo": "service-b", "name": "process_data", "type": "function", "file": "utils.py", "line": 25},]extract_all_symbols(symbol_type=None)
Extract all symbols from all repositories.
# Get all symbols, organized by repoall_symbols = repos.extract_all_symbols()
# Get only functionsfunctions = repos.extract_all_symbols(symbol_type="function")
# Get only classesclasses = repos.extract_all_symbols(symbol_type="class")Returns: Dict mapping repo names to symbol lists:
{ "frontend": [{"name": "App", "type": "class", ...}, ...], "backend": [{"name": "main", "type": "function", ...}, ...],}audit_dependencies()
Audit dependencies across all repositories. Parses:
- Python:
requirements.txt,pyproject.toml - JavaScript:
package.json - Rust:
Cargo.toml - Go:
go.mod
audit = repos.audit_dependencies()Returns:
{ "frontend": { "javascript": {"react": "^18.0.0", "axios": "^1.0.0"} }, "backend": { "python": {"fastapi": "0.100.0", "sqlalchemy": "2.0.0"} }, "rust-service": { "rust": {"tokio": "1.0", "serde": "1.0"} }}summarize()
Generate summaries of all repositories.
summaries = repos.summarize()Returns:
{ "frontend": { "path": "/home/user/code/frontend", "file_count": 150, "extensions": {".ts": 80, ".tsx": 50, ".css": 20}, "languages": {"TypeScript": 130, "CSS": 20} }, "backend": { "path": "/home/user/code/backend", "file_count": 75, "extensions": {".py": 70, ".sql": 5}, "languages": {"Python": 70} }}get_file_content(repo_name, file_path)
Read a file from a specific repository.
content = repos.get_file_content("frontend", "src/App.tsx")config = repos.get_file_content("backend", "config.py")index_all(embed_fn=None)
Build or refresh vector indexes for all repositories (for semantic search).
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")repos.index_all(embed_fn=model.encode)Properties
repos
Dict mapping repo names to Repository instances.
names
List of repository names.
CLI Commands
MultiRepo is also available via CLI:
# Search across reposkit multi search ~/frontend ~/backend -q "handleAuth"
# With file patternkit multi search ~/api ~/web -q "TODO" -p "*.py"
# Limit resultskit multi search . ../other -q "import" --max-per-repo 5# Find specific symbolkit multi symbols ~/frontend ~/backend -n "UserModel" -t class
# List all functionskit multi symbols ~/service-a ~/service-b -t function# Audit all dependencieskit multi deps ~/frontend ~/backend ~/shared
# Output to JSONkit multi deps ~/service-* -o deps.json# Get summarieskit multi summary ~/frontend ~/backend ~/mobile
# Output to JSONkit multi summary ~/project-* -o summary.jsonBest Practices
Repository Organization
For best results, ensure your repositories follow consistent patterns:
# Good: Clear, descriptive pathsrepos = MultiRepo([ "~/code/myapp-frontend", "~/code/myapp-backend", "~/code/myapp-shared",])
# Custom names for clarityrepos = MultiRepo( ["~/code/client", "~/code/server"], names={ str(Path("~/code/client").expanduser().resolve()): "web-frontend", str(Path("~/code/server").expanduser().resolve()): "api-backend", })Performance Tips
- Limit results: Use
max_results_per_repofor large codebases - File patterns: Filter with
file_patternto search specific file types - Build indexes once: Call
index_all()once, then reuse for semantic searches
Integration Patterns
# Use with single-repo operationsrepos = MultiRepo([...])
# Get a single repo for detailed analysisfrontend = repos["frontend"]frontend_symbols = frontend.extract_symbols()
# Or iterate for batch operationsfor name, repo in repos: print(f"Analyzing {name}...") # Use full Repository API on each repo