MultiRepo API

The MultiRepo class provides a unified interface for analyzing multiple repositories together. Search, extract symbols, and audit dependencies across your entire codebase ecosystem as if it were a single repository.

Use Cases

Microservices Architecture

When your application is split across multiple services, MultiRepo lets you search and analyze them together:

from kit import MultiRepo

services = MultiRepo([
    "~/code/user-service",
    "~/code/order-service",
    "~/code/payment-service",
    "~/code/notification-service",
])

# Find all authentication-related code across services
auth_code = services.search("authenticate")

# Find where UserModel is defined across services
user_models = services.find_symbol("UserModel", symbol_type="class")

Frontend + Backend Projects

Analyze your full-stack application across language boundaries:

repos = MultiRepo([
    "~/code/web-app",      # React/TypeScript
    "~/code/api-server",   # Python/FastAPI
    "~/code/shared-types", # Shared schemas
])

# Search for API endpoint patterns across stack
endpoints = repos.search("handleAuth")

# Audit all dependencies
deps = repos.audit_dependencies()
# Returns: {"web-app": {"javascript": {"react": "^18.0"}},
#           "api-server": {"python": {"fastapi": "0.100.0"}}, ...}

Team Repositories

Find patterns and inconsistencies across team projects:

team_repos = MultiRepo([
    "~/code/project-alpha",
    "~/code/project-beta",
    "~/code/project-gamma",
])

# Find all TODO comments across projects
todos = team_repos.search("TODO")

# Get summary of all repositories
summaries = team_repos.summarize()
for name, info in summaries.items():
    print(f"{name}: {info['file_count']} files, {list(info['languages'].keys())}")

Semantic Search Across Repos

Find code by meaning, not just keywords—across all your repositories:

from sentence_transformers import SentenceTransformer

repos = MultiRepo([
    "~/code/auth-service",
    "~/code/user-service",
    "~/code/api-gateway",
])

# Build indexes once
model = SentenceTransformer("all-MiniLM-L6-v2")
repos.index_all(embed_fn=model.encode)

# Search by concept - finds relevant code even with different naming conventions
results = repos.search_semantic("user authentication and session management", top_k=10)

# Results ranked by relevance across ALL repos
for r in results:
    print(f"[{r['repo']}] {r['file']} (score: {r['score']:.2f})")

# More examples:
repos.search_semantic("error handling and retry logic")
repos.search_semantic("database connection pooling")
repos.search_semantic("rate limiting implementation")

This is especially useful when:

Different teams use different naming conventions
You’re exploring an unfamiliar codebase ecosystem
You need to find similar patterns implemented differently across services

Dependency Auditing

Spot version drift across your services - a common source of subtle bugs in microservices:

repos = MultiRepo(["~/code/service-a", "~/code/service-b", "~/code/service-c"])
audit = repos.audit_dependencies()

# Detect version drift: find packages with different versions across repos
from collections import defaultdict

packages = defaultdict(dict)  # {package: {repo: version}}
for repo, deps in audit.items():
    for lang, pkgs in deps.items():
        for pkg, version in pkgs.items():
            packages[f"{lang}:{pkg}"][repo] = version

# Show packages with version mismatches
for pkg, versions in packages.items():
    unique_versions = set(versions.values())
    if len(unique_versions) > 1:  # Different versions detected!
        print(f"⚠️  {pkg} has version drift:")
        for repo, ver in versions.items():
            print(f"    {repo}: {ver}")

# Output:
# ⚠️  python:requests has version drift:
#     service-a: 2.28.0
#     service-b: 2.31.0
# ⚠️  javascript:axios has version drift:
#     frontend: ^1.4.0
#     admin-ui: ^0.27.0

Quick Start

from kit import MultiRepo

# Initialize with repository paths (local or remote)
repos = MultiRepo([
    "/path/to/repo-a",
    "~/code/repo-b",                              # Tilde expansion
    "https://github.com/owner/repo-c",            # Remote URL
    "https://github.com/owner/repo-d.git",        # With .git suffix
])

# Access individual repos
print(repos.names)  # ['repo-a', 'repo-b', 'repo-c', 'repo-d']
single_repo = repos["repo-a"]  # Returns Repository instance

# Iterate over all repos
for name, repo in repos:
    print(f"{name}: {len(repo.get_file_tree())} files")

Initialization

MultiRepo(
    paths: Sequence[str | Path],
    names: Optional[Dict[str, str]] = None,
)

Parameters:

paths: List of paths to repositories - supports:
- Local paths: /path/to/repo
- Tilde expansion: ~/code/repo
- Remote URLs: https://github.com/owner/repo
names: Optional dict mapping path strings to friendly names. If not provided, directory/repo names are extracted automatically

Name Collision Handling:

When multiple repositories have the same directory name, MultiRepo automatically appends a counter:

repos = MultiRepo([
    "~/org-a/utils",  # Named "utils"
    "~/org-b/utils",  # Named "utils_1"
    "~/org-c/utils",  # Named "utils_2"
])

Methods

`search(query, file_pattern="*", max_results_per_repo=None)`

Search for text/regex across all repositories.

# Basic search
results = repos.search("handleAuth")

# Search only Python files
results = repos.search("TODO", file_pattern="*.py")

# Limit results per repo
results = repos.search("import", max_results_per_repo=10)

Returns: List of matches with repo attribution:

[
    {"repo": "frontend", "file": "src/auth.js", "line_number": 42, "line_content": "..."},
    {"repo": "backend", "file": "auth/views.py", "line_number": 15, "line_content": "..."},
]

`search_semantic(query, top_k=10, top_k_per_repo=None, embed_fn=None)`

Semantic search across all repositories, returning results ranked by relevance.

# Search by meaning, not just keywords
results = repos.search_semantic("error handling patterns", top_k=20)

# Custom embedding function
results = repos.search_semantic("authentication flow", embed_fn=my_embed_fn)

Returns: Results sorted by relevance score with repo attribution.

`find_symbol(symbol_name, symbol_type=None)`

Find a specific symbol across all repositories.

# Find function by name
funcs = repos.find_symbol("process_data", symbol_type="function")

# Find class by name
classes = repos.find_symbol("UserModel", symbol_type="class")

# Find any symbol with that name
all_matches = repos.find_symbol("Config")

Returns:

[
    {"repo": "service-a", "name": "process_data", "type": "function", "file": "main.py", "line": 10},
    {"repo": "service-b", "name": "process_data", "type": "function", "file": "utils.py", "line": 25},
]

`extract_all_symbols(symbol_type=None)`

Extract all symbols from all repositories.

# Get all symbols, organized by repo
all_symbols = repos.extract_all_symbols()

# Get only functions
functions = repos.extract_all_symbols(symbol_type="function")

# Get only classes
classes = repos.extract_all_symbols(symbol_type="class")

Returns: Dict mapping repo names to symbol lists:

{
    "frontend": [{"name": "App", "type": "class", ...}, ...],
    "backend": [{"name": "main", "type": "function", ...}, ...],
}

`audit_dependencies()`

Audit dependencies across all repositories. Parses:

Python: requirements.txt, pyproject.toml
JavaScript: package.json
Rust: Cargo.toml
Go: go.mod

audit = repos.audit_dependencies()

Returns:

{
    "frontend": {
        "javascript": {"react": "^18.0.0", "axios": "^1.0.0"}
    },
    "backend": {
        "python": {"fastapi": "0.100.0", "sqlalchemy": "2.0.0"}
    },
    "rust-service": {
        "rust": {"tokio": "1.0", "serde": "1.0"}
    }
}

`summarize()`

Generate summaries of all repositories.

summaries = repos.summarize()

Returns:

{
    "frontend": {
        "path": "/home/user/code/frontend",
        "file_count": 150,
        "extensions": {".ts": 80, ".tsx": 50, ".css": 20},
        "languages": {"TypeScript": 130, "CSS": 20}
    },
    "backend": {
        "path": "/home/user/code/backend",
        "file_count": 75,
        "extensions": {".py": 70, ".sql": 5},
        "languages": {"Python": 70}
    }
}

`get_file_content(repo_name, file_path)`

Read a file from a specific repository.

content = repos.get_file_content("frontend", "src/App.tsx")
config = repos.get_file_content("backend", "config.py")

`index_all(embed_fn=None)`

Build or refresh vector indexes for all repositories (for semantic search).

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
repos.index_all(embed_fn=model.encode)

Properties

`repos`

Dict mapping repo names to Repository instances.

`names`

List of repository names.

CLI Commands

MultiRepo is also available via CLI:

# Search across repos
kit multi search ~/frontend ~/backend -q "handleAuth"

# With file pattern
kit multi search ~/api ~/web -q "TODO" -p "*.py"

# Limit results
kit multi search . ../other -q "import" --max-per-repo 5

# Find specific symbol
kit multi symbols ~/frontend ~/backend -n "UserModel" -t class

# List all functions
kit multi symbols ~/service-a ~/service-b -t function

# Audit all dependencies
kit multi deps ~/frontend ~/backend ~/shared

# Output to JSON
kit multi deps ~/service-* -o deps.json

# Get summaries
kit multi summary ~/frontend ~/backend ~/mobile

# Output to JSON
kit multi summary ~/project-* -o summary.json

Best Practices

Repository Organization

For best results, ensure your repositories follow consistent patterns:

# Good: Clear, descriptive paths
repos = MultiRepo([
    "~/code/myapp-frontend",
    "~/code/myapp-backend",
    "~/code/myapp-shared",
])

# Custom names for clarity
repos = MultiRepo(
    ["~/code/client", "~/code/server"],
    names={
        str(Path("~/code/client").expanduser().resolve()): "web-frontend",
        str(Path("~/code/server").expanduser().resolve()): "api-backend",
    }
)

Performance Tips

Limit results: Use max_results_per_repo for large codebases
File patterns: Filter with file_pattern to search specific file types
Build indexes once: Call index_all() once, then reuse for semantic searches

Integration Patterns

# Use with single-repo operations
repos = MultiRepo([...])

# Get a single repo for detailed analysis
frontend = repos["frontend"]
frontend_symbols = frontend.extract_symbols()

# Or iterate for batch operations
for name, repo in repos:
    print(f"Analyzing {name}...")
    # Use full Repository API on each repo