Skip to content

MultiRepo API

MultiRepo API

The MultiRepo class provides a unified interface for analyzing multiple repositories together. Search, extract symbols, and audit dependencies across your entire codebase ecosystem as if it were a single repository.

Use Cases

Microservices Architecture

When your application is split across multiple services, MultiRepo lets you search and analyze them together:

from kit import MultiRepo
services = MultiRepo([
"~/code/user-service",
"~/code/order-service",
"~/code/payment-service",
"~/code/notification-service",
])
# Find all authentication-related code across services
auth_code = services.search("authenticate")
# Find where UserModel is defined across services
user_models = services.find_symbol("UserModel", symbol_type="class")

Frontend + Backend Projects

Analyze your full-stack application across language boundaries:

repos = MultiRepo([
"~/code/web-app", # React/TypeScript
"~/code/api-server", # Python/FastAPI
"~/code/shared-types", # Shared schemas
])
# Search for API endpoint patterns across stack
endpoints = repos.search("handleAuth")
# Audit all dependencies
deps = repos.audit_dependencies()
# Returns: {"web-app": {"javascript": {"react": "^18.0"}},
# "api-server": {"python": {"fastapi": "0.100.0"}}, ...}

Team Repositories

Find patterns and inconsistencies across team projects:

team_repos = MultiRepo([
"~/code/project-alpha",
"~/code/project-beta",
"~/code/project-gamma",
])
# Find all TODO comments across projects
todos = team_repos.search("TODO")
# Get summary of all repositories
summaries = team_repos.summarize()
for name, info in summaries.items():
print(f"{name}: {info['file_count']} files, {list(info['languages'].keys())}")

Semantic Search Across Repos

Find code by meaning, not just keywords—across all your repositories:

from sentence_transformers import SentenceTransformer
repos = MultiRepo([
"~/code/auth-service",
"~/code/user-service",
"~/code/api-gateway",
])
# Build indexes once
model = SentenceTransformer("all-MiniLM-L6-v2")
repos.index_all(embed_fn=model.encode)
# Search by concept - finds relevant code even with different naming conventions
results = repos.search_semantic("user authentication and session management", top_k=10)
# Results ranked by relevance across ALL repos
for r in results:
print(f"[{r['repo']}] {r['file']} (score: {r['score']:.2f})")
# More examples:
repos.search_semantic("error handling and retry logic")
repos.search_semantic("database connection pooling")
repos.search_semantic("rate limiting implementation")

This is especially useful when:

  • Different teams use different naming conventions
  • You’re exploring an unfamiliar codebase ecosystem
  • You need to find similar patterns implemented differently across services

Dependency Auditing

Spot version drift across your services - a common source of subtle bugs in microservices:

repos = MultiRepo(["~/code/service-a", "~/code/service-b", "~/code/service-c"])
audit = repos.audit_dependencies()
# Detect version drift: find packages with different versions across repos
from collections import defaultdict
packages = defaultdict(dict) # {package: {repo: version}}
for repo, deps in audit.items():
for lang, pkgs in deps.items():
for pkg, version in pkgs.items():
packages[f"{lang}:{pkg}"][repo] = version
# Show packages with version mismatches
for pkg, versions in packages.items():
unique_versions = set(versions.values())
if len(unique_versions) > 1: # Different versions detected!
print(f"⚠️ {pkg} has version drift:")
for repo, ver in versions.items():
print(f" {repo}: {ver}")
# Output:
# ⚠️ python:requests has version drift:
# service-a: 2.28.0
# service-b: 2.31.0
# ⚠️ javascript:axios has version drift:
# frontend: ^1.4.0
# admin-ui: ^0.27.0

Quick Start

from kit import MultiRepo
# Initialize with repository paths (local or remote)
repos = MultiRepo([
"/path/to/repo-a",
"~/code/repo-b", # Tilde expansion
"https://github.com/owner/repo-c", # Remote URL
"https://github.com/owner/repo-d.git", # With .git suffix
])
# Access individual repos
print(repos.names) # ['repo-a', 'repo-b', 'repo-c', 'repo-d']
single_repo = repos["repo-a"] # Returns Repository instance
# Iterate over all repos
for name, repo in repos:
print(f"{name}: {len(repo.get_file_tree())} files")

Initialization

MultiRepo(
paths: Sequence[str | Path],
names: Optional[Dict[str, str]] = None,
)

Parameters:

  • paths: List of paths to repositories - supports:
    • Local paths: /path/to/repo
    • Tilde expansion: ~/code/repo
    • Remote URLs: https://github.com/owner/repo
  • names: Optional dict mapping path strings to friendly names. If not provided, directory/repo names are extracted automatically

Name Collision Handling:

When multiple repositories have the same directory name, MultiRepo automatically appends a counter:

repos = MultiRepo([
"~/org-a/utils", # Named "utils"
"~/org-b/utils", # Named "utils_1"
"~/org-c/utils", # Named "utils_2"
])

Methods

search(query, file_pattern="*", max_results_per_repo=None)

Search for text/regex across all repositories.

# Basic search
results = repos.search("handleAuth")
# Search only Python files
results = repos.search("TODO", file_pattern="*.py")
# Limit results per repo
results = repos.search("import", max_results_per_repo=10)

Returns: List of matches with repo attribution:

[
{"repo": "frontend", "file": "src/auth.js", "line_number": 42, "line_content": "..."},
{"repo": "backend", "file": "auth/views.py", "line_number": 15, "line_content": "..."},
]

search_semantic(query, top_k=10, top_k_per_repo=None, embed_fn=None)

Semantic search across all repositories, returning results ranked by relevance.

# Search by meaning, not just keywords
results = repos.search_semantic("error handling patterns", top_k=20)
# Custom embedding function
results = repos.search_semantic("authentication flow", embed_fn=my_embed_fn)

Returns: Results sorted by relevance score with repo attribution.

find_symbol(symbol_name, symbol_type=None)

Find a specific symbol across all repositories.

# Find function by name
funcs = repos.find_symbol("process_data", symbol_type="function")
# Find class by name
classes = repos.find_symbol("UserModel", symbol_type="class")
# Find any symbol with that name
all_matches = repos.find_symbol("Config")

Returns:

[
{"repo": "service-a", "name": "process_data", "type": "function", "file": "main.py", "line": 10},
{"repo": "service-b", "name": "process_data", "type": "function", "file": "utils.py", "line": 25},
]

extract_all_symbols(symbol_type=None)

Extract all symbols from all repositories.

# Get all symbols, organized by repo
all_symbols = repos.extract_all_symbols()
# Get only functions
functions = repos.extract_all_symbols(symbol_type="function")
# Get only classes
classes = repos.extract_all_symbols(symbol_type="class")

Returns: Dict mapping repo names to symbol lists:

{
"frontend": [{"name": "App", "type": "class", ...}, ...],
"backend": [{"name": "main", "type": "function", ...}, ...],
}

audit_dependencies()

Audit dependencies across all repositories. Parses:

  • Python: requirements.txt, pyproject.toml
  • JavaScript: package.json
  • Rust: Cargo.toml
  • Go: go.mod
audit = repos.audit_dependencies()

Returns:

{
"frontend": {
"javascript": {"react": "^18.0.0", "axios": "^1.0.0"}
},
"backend": {
"python": {"fastapi": "0.100.0", "sqlalchemy": "2.0.0"}
},
"rust-service": {
"rust": {"tokio": "1.0", "serde": "1.0"}
}
}

summarize()

Generate summaries of all repositories.

summaries = repos.summarize()

Returns:

{
"frontend": {
"path": "/home/user/code/frontend",
"file_count": 150,
"extensions": {".ts": 80, ".tsx": 50, ".css": 20},
"languages": {"TypeScript": 130, "CSS": 20}
},
"backend": {
"path": "/home/user/code/backend",
"file_count": 75,
"extensions": {".py": 70, ".sql": 5},
"languages": {"Python": 70}
}
}

get_file_content(repo_name, file_path)

Read a file from a specific repository.

content = repos.get_file_content("frontend", "src/App.tsx")
config = repos.get_file_content("backend", "config.py")

index_all(embed_fn=None)

Build or refresh vector indexes for all repositories (for semantic search).

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
repos.index_all(embed_fn=model.encode)

Properties

repos

Dict mapping repo names to Repository instances.

names

List of repository names.

CLI Commands

MultiRepo is also available via CLI:

Terminal window
# Search across repos
kit multi search ~/frontend ~/backend -q "handleAuth"
# With file pattern
kit multi search ~/api ~/web -q "TODO" -p "*.py"
# Limit results
kit multi search . ../other -q "import" --max-per-repo 5

Best Practices

Repository Organization

For best results, ensure your repositories follow consistent patterns:

# Good: Clear, descriptive paths
repos = MultiRepo([
"~/code/myapp-frontend",
"~/code/myapp-backend",
"~/code/myapp-shared",
])
# Custom names for clarity
repos = MultiRepo(
["~/code/client", "~/code/server"],
names={
str(Path("~/code/client").expanduser().resolve()): "web-frontend",
str(Path("~/code/server").expanduser().resolve()): "api-backend",
}
)

Performance Tips

  1. Limit results: Use max_results_per_repo for large codebases
  2. File patterns: Filter with file_pattern to search specific file types
  3. Build indexes once: Call index_all() once, then reuse for semantic searches

Integration Patterns

# Use with single-repo operations
repos = MultiRepo([...])
# Get a single repo for detailed analysis
frontend = repos["frontend"]
frontend_symbols = frontend.extract_symbols()
# Or iterate for batch operations
for name, repo in repos:
print(f"Analyzing {name}...")
# Use full Repository API on each repo