Metadata-Version: 2.4
Name: codebase-md
Version: 0.1.0
Summary: The universal project brain — scan any codebase, generate context files for every AI coding tool.
Project-URL: Homepage, https://github.com/sauravanand542/codebase-md
Project-URL: Repository, https://github.com/sauravanand542/codebase-md
Project-URL: Issues, https://github.com/sauravanand542/codebase-md/issues
Project-URL: Changelog, https://github.com/sauravanand542/codebase-md/blob/main/CHANGELOG.md
Author-email: Saurav <anandsaurav668@gmail.com>
License: MIT
License-File: LICENSE
Keywords: ai,claude,codebase,codex,context,cursor,developer-tools,vibe-coding,windsurf
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: httpx>=0.25.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Provides-Extra: ast
Requires-Dist: tree-sitter-javascript>=0.21.0; extra == 'ast'
Requires-Dist: tree-sitter-python>=0.21.0; extra == 'ast'
Requires-Dist: tree-sitter-typescript>=0.21.0; extra == 'ast'
Requires-Dist: tree-sitter>=0.21.0; extra == 'ast'
Provides-Extra: dev
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0; extra == 'dev'
Description-Content-Type: text/markdown

# codebase-md

**The universal project brain that works with every AI coding tool.**

[![PyPI](https://img.shields.io/pypi/v/codebase-md)](https://pypi.org/project/codebase-md/)
[![CI](https://github.com/sauravanand542/codebase-md/actions/workflows/ci.yml/badge.svg)](https://github.com/sauravanand542/codebase-md/actions/workflows/ci.yml)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-354%20passed-brightgreen.svg)]()

One command scans your codebase and generates context files for **Claude Code, Cursor, Codex, Windsurf**, and more — auto-detected conventions, dependency health, architecture maps, and smart context routing. Stays fresh via git hooks.

---

## Why?

Every AI coding tool needs project context to work well. But each tool has its own format:
- Claude Code wants `CLAUDE.md`
- Cursor wants `.cursorrules`
- Codex wants `codex.md`
- Windsurf wants `.windsurfrules`

Writing and maintaining these manually is tedious. **codebase-md** scans your project once and generates all of them from a single source of truth.

## Features

- **Universal output** — generates 6 formats from one scan (CLAUDE.md, .cursorrules, AGENTS.md, codex.md, .windsurfrules, PROJECT_CONTEXT.md)
- **Auto-detected conventions** — naming style, import patterns, file organization, design patterns (powered by tree-sitter AST)
- **Dependency intelligence** — health scores, version diffs, breaking change detection, migration plans with code impact
- **Architecture mapping** — detects monolith/monorepo/microservice/library/CLI patterns, entry points, modules
- **Smart context routing** — query-based context retrieval with TF-IDF relevance scoring
- **Git integration** — hooks for auto-regeneration on commit, contributor analysis, file hotspots
- **Multi-language** — Python, JavaScript, TypeScript (50+ file extensions recognized)

---

## v0.1.0 — Current Status

> **Alpha release** — this is the first public release of codebase-md. Core functionality is working and tested, but APIs and output formats may change between minor versions. Please pin your version (`pip install codebase-md==0.1.0`) and [report issues](https://github.com/sauravanand542/codebase-md/issues).

### What Works Well

- **Single-command scan** — `codebase scan .` analyzes your entire project in seconds
- **5 output formats** — CLAUDE.md, AGENTS.md, .cursorrules, codex.md, .windsurfrules (+ generic PROJECT_CONTEXT.md)
- **Language detection** — Python, TypeScript, JavaScript, Go, Rust and 50+ file extensions
- **Dependency parsing** — requirements.txt, pyproject.toml, package.json, go.mod, Cargo.toml, Gemfile
- **Convention inference** — naming style, import patterns, file organization, design patterns (via tree-sitter AST)
- **Architecture detection** — monolith, monorepo, microservice, library, CLI tool
- **Git insights** — commit history, contributor analysis, file change hotspots
- **Dependency health** — live registry queries (PyPI, npm) with health scoring and breaking change detection
- **Smart context routing** — TF-IDF relevance scoring for query-based context retrieval

### Known Limitations

- **AST grammars** — tree-sitter support is limited to Python, JavaScript, and TypeScript; Go and Rust are parsed via heuristics
- **No incremental mode** — every scan re-analyzes the full project (no watch/diff mode yet)
- **Large monorepos** — projects with >10,000 files may experience slower scan times
- **Network dependency** — DepShift registry queries (PyPI/npm health checks) require network access; use `--offline` to skip
- **No Windows CI** — tested on Linux and macOS; Windows should work but is not yet part of CI

### Tested Against

The test suite (354 tests) validates against these project archetypes:

| Fixture | Type | Languages |
|---------|------|-----------|
| Python CLI | CLI tool | Python |
| FastAPI App | Web API | Python |
| Next.js App | Full-stack | TypeScript, JavaScript |
| Go CLI | CLI tool | Go |
| Rust CLI | CLI tool | Rust |
| Mixed Language | Multi-lang | Python, JS, Go |
| Monorepo | Monorepo | Multiple |
| Empty Repo | Edge case | — |

Integration tests also run against real-world repositories (see [test_real_repos.py](tests/integration/test_real_repos.py)).

---

## Installation

### From PyPI

```bash
pip install codebase-md
```

### With AST support (recommended)

```bash
pip install "codebase-md[ast]"
```

### From GitHub (latest dev)

```bash
pip install git+https://github.com/sauravanand542/codebase-md.git
```

### For development

```bash
git clone https://github.com/sauravanand542/codebase-md.git
cd codebase-md
pip install -e ".[dev,ast]"
```

---

## Quick Start

```bash
# Initialize config in your project
cd your-project/
codebase init

# Scan your codebase (builds internal project model)
codebase scan .

# Generate context files for all AI tools
codebase generate .
```

That's it. You now have `CLAUDE.md`, `.cursorrules`, `AGENTS.md`, `codex.md`, `.windsurfrules`, and `PROJECT_CONTEXT.md` in your project root.

---

## Commands

### `codebase scan`

Scans your project and builds a complete model: languages, architecture, dependencies, conventions, modules, git history.

```bash
codebase scan .                    # Scan current directory
codebase scan /path/to/project     # Scan a specific project
```

### `codebase generate`

Generates context files from the last scan.

```bash
codebase generate .                # Generate all formats
codebase generate . --format claude  # Generate only CLAUDE.md
```

### `codebase deps`

Dependency health dashboard — checks versions against registries, computes health scores.

```bash
codebase deps .                    # Health dashboard (queries PyPI/npm)
codebase deps . --offline          # Offline mode (no network)
codebase deps . --upgrade typer    # Migration plan for a specific package
```

### `codebase context`

Query relevant project context with smart ranking.

```bash
codebase context "architecture"              # Find architecture info
codebase context "dependencies" --max 3      # Top 3 relevant chunks
codebase context "how to test" --compact     # Content-only output
```

### `codebase hooks`

Install git hooks for automatic regeneration.

```bash
codebase hooks install .           # Install post-commit hooks
codebase hooks status .            # Show installed hooks
codebase hooks remove .            # Remove hooks
```

### `codebase init`

Initialize `.codebase/` configuration directory.

```bash
codebase init                      # Creates .codebase/config.yaml
```

---

## Output Formats

| Format | File | AI Tool | Description |
|---|---|---|---|
| `claude` | `CLAUDE.md` | Claude Code | Structured markdown with project summary, architecture, conventions |
| `cursor` | `.cursorrules` | Cursor | Coding rules, language-specific guidance, tech stack |
| `agents` | `AGENTS.md` | Multi-agent | Compact entry points, commands, architecture flow |
| `codex` | `codex.md` | Codex CLI | Overview, setup, project structure, conventions |
| `windsurf` | `.windsurfrules` | Windsurf | Rules-based format with architecture and file map |
| `generic` | `PROJECT_CONTEXT.md` | Any tool | Complete markdown with all sections + metadata |

---

## What Gets Detected

### Languages & Frameworks
50+ file extensions recognized. Framework detection for Python (Django, FastAPI, Flask), JavaScript/TypeScript (React, Next.js, Express, Vue).

### Architecture Patterns
Monolith, monorepo, microservice, library, CLI tool — detected from folder structure, entry points, and package layout.

### Conventions
- **Naming**: snake_case, camelCase, PascalCase, kebab-case
- **Imports**: absolute, relative, mixed
- **File organization**: modular, layer-based, feature-based, flat
- **Design patterns**: model, view, controller, service, repository, etc.

### Dependencies
Parses `package.json`, `requirements.txt`, `pyproject.toml`, `go.mod`, `Cargo.toml`, `Gemfile`. Health scoring via live registry queries (PyPI, npm).

---

## Project Structure

```
src/codebase_md/
├── cli.py                  # Typer CLI — all commands
├── model/                  # Pydantic v2 data models (frozen, validated)
├── scanner/                # Codebase analysis engine
│   ├── engine.py           # Orchestrates all scanners
│   ├── language_detector.py
│   ├── structure_analyzer.py
│   ├── dependency_parser.py
│   ├── convention_inferrer.py  # tree-sitter powered
│   ├── ast_analyzer.py        # tree-sitter AST
│   └── git_analyzer.py
├── generators/             # Output format generators (plugin-style)
├── depshift/               # Dependency intelligence engine
│   ├── analyzer.py         # Health scoring
│   ├── version_differ.py   # Breaking change detection
│   ├── usage_mapper.py     # Import → source location mapping
│   └── registries/         # PyPI + npm clients
├── context/                # Smart context routing
│   ├── chunker.py          # 12 topic-based chunks
│   ├── ranker.py           # 6-signal TF-IDF scoring
│   └── router.py           # Query pipeline
├── persistence/            # .codebase/ state management
└── integrations/           # Git hooks, GitHub Actions
```

---

## Configuration

After `codebase init`, edit `.codebase/config.yaml`:

```yaml
version: 1
generators:
  - claude
  - cursor
  - agents
  - codex
  - windsurf
  - generic
scan:
  exclude:
    - node_modules
    - .venv
    - dist
    - build
hooks:
  post_commit: true
  pre_push: false
```

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, coding conventions, and PR guidelines.

## License

[MIT](LICENSE)
