# ck **Repository Path**: mirrors_trending/ck ## Basic Information - **Project Name**: ck - **Description**: Local first semantic and hybrid BM25 grep / search tool for use by AI and humans! - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-28 - **Last Updated**: 2026-02-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ck - Semantic Code Search [![CI](https://github.com/BeaconBay/ck/actions/workflows/ci.yaml/badge.svg)](https://github.com/BeaconBay/ck/actions/workflows/ci.yaml) [![Crates.io](https://img.shields.io/crates/v/ck-search.svg)](https://crates.io/crates/ck-search) [![Downloads](https://img.shields.io/crates/d/ck-search.svg)](https://crates.io/crates/ck-search) [![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](LICENSE-MIT) [![MSRV](https://img.shields.io/badge/rust-1.88%2B-blue.svg)](https://www.rust-lang.org) [![Documentation](https://img.shields.io/badge/docs-beaconbay.github.io%2Fck-blue)](https://beaconbay.github.io/ck/) **ck (seek)** finds code by meaning, not just keywords. It's grep that understands what you're looking for โ€” search for "error handling" and find try/catch blocks, error returns, and exception handling code even when those exact words aren't present. ## ๐Ÿš€ Quick Start ```bash # Install from crates.io cargo install ck-search # Just search โ€” ck builds and updates indexes automatically ck --sem "error handling" src/ ck --sem "authentication logic" src/ ck --sem "database connection pooling" src/ # Traditional grep-compatible search still works ck -n "TODO" *.rs ck -R "TODO|FIXME" . # Combine both: semantic relevance + keyword filtering ck --hybrid "connection timeout" src/ ``` > **๐Ÿ“š [Full Documentation](https://beaconbay.github.io/ck/)** โ€” Installation guides, tutorials, feature deep-dives, and API reference ## โœจ Headline Features ### ๐Ÿค– **AI Agent Integration (MCP Server)** Connect ck directly to Claude Desktop, Cursor, or any MCP-compatible AI client for seamless code search integration: ```bash # Start MCP server for AI agent integration ck --serve ``` **Claude Desktop Setup:** ```bash # Install via Claude Code CLI (recommended) claude mcp add ck-search -s user -- ck --serve # Note: You may need to restart Claude Code after installation # Verify installation with: claude mcp list # or use /mcp in Claude Code ``` **Manual Configuration (alternative):** ```json { "mcpServers": { "ck": { "command": "ck", "args": ["--serve"], "cwd": "/path/to/your/codebase" } } } ``` **Tool Permissions:** When prompted by Claude Code, approve permissions for ck-search tools (semantic_search, regex_search, hybrid_search, etc.) **Available MCP Tools:** - `semantic_search` - Find code by meaning using embeddings - `regex_search` - Traditional grep-style pattern matching - `hybrid_search` - Combined semantic and keyword search - `index_status` - Check indexing status and metadata - `reindex` - Force rebuild of search index - `health_check` - Server status and diagnostics **Built-in Pagination:** Handles large result sets gracefully with page_size controls, cursors, and snippet length management. ### ๐ŸŽจ **Interactive TUI (Terminal User Interface)** Launch an interactive search interface with real-time results and multiple preview modes: ```bash # Start TUI for current directory ck --tui # Start with initial query ck --tui "error handling" ``` **Features:** - **Multiple Search Modes**: Toggle between Semantic, Regex, and Hybrid search with `Tab` - **Preview Modes**: Switch between Heatmap, Syntax highlighting, and Chunk view with `Ctrl+V` - **View Options**: Toggle between snippet and full-file view with `Ctrl+F` - **Multi-select**: Select multiple files with `Ctrl+Space`, open all in editor with `Enter` - **Search History**: Navigate with `Ctrl+Up/Down` - **Editor Integration**: Opens files in `$EDITOR` with line numbers (Vim, VS Code, Cursor, etc.) - **Progress Tracking**: Live indexing progress with file and chunk counts - **Config Persistence**: Preferences saved to `~/.config/ck/tui.json` See [TUI.md](TUI.md) for keyboard shortcuts and detailed usage. ### ๐Ÿ” **Semantic Search** Find code by concept, not keywords. Understands synonyms, related terms, and conceptual similarity: ```bash # These find related code even without exact keywords: ck --sem "retry logic" # finds backoff, circuit breakers ck --sem "user authentication" # finds login, auth, credentials ck --sem "data validation" # finds sanitization, type checking # Get complete functions/classes containing matches ck --sem --full-section "error handling" # returns entire functions ``` ### โšก **Drop-in grep Compatibility** All your muscle memory works. Same flags, same behavior, same output format: ```bash ck -i "warning" *.log # Case-insensitive ck -n -A 3 -B 1 "error" src/ # Line numbers + context ck -l "error" src/ # List files with matches only ck -L "TODO" src/ # List files without matches ck -R --exclude "*.test.js" "bug" # Recursive with exclusions ``` ### ๐ŸŽฏ **Hybrid Search** Combine keyword precision with semantic understanding using Reciprocal Rank Fusion: ```bash ck --hybrid "async timeout" src/ # Best of both worlds ck --hybrid --scores "cache" src/ # Show relevance scores with color highlighting ck --hybrid --threshold 0.02 query # Filter by minimum relevance ``` ### โš™๏ธ **Automatic Delta Indexing with Chunk-Level Caching** Semantic and hybrid searches transparently create and refresh their indexes before running. The first search builds what it needs; subsequent searches intelligently reuse cached embeddings: - **Chunk-level incremental indexing**: Only changed chunks are re-embedded (80-90% cache hit rate for typical code changes) - **Content-aware invalidation**: Doc comments and whitespace changes properly invalidate cache - **Model consistency**: Prevents silent embedding corruption when switching models - **Smart caching**: Hash-based invalidation using blake3(text + trivia) for reliable change detection ### ๐Ÿ“ **Smart File Filtering** Automatically excludes cache directories, build artifacts, and respects `.gitignore` and `.ckignore` files: ```bash # ck respects multiple exclusion layers (all are additive): ck "pattern" . # Uses .gitignore + .ckignore + defaults ck --no-ignore "pattern" . # Skip .gitignore (still uses .ckignore) ck --no-ckignore "pattern" . # Skip .ckignore (still uses .gitignore) ck --exclude "dist" --exclude "logs" . # Add custom exclusions # .ckignore file (created automatically on first index): # - Excludes images, videos, audio, binaries, archives by default # - Excludes JSON/YAML config files (issue #27) # - Uses same syntax as .gitignore (glob patterns, ! for negation) # - Persists across searches (issue #67) # - Located at repository root, editable for custom patterns # Exclusion patterns use .gitignore syntax: ck --exclude "node_modules" . # Exclude directory and all contents ck --exclude "*.test.js" . # Exclude files matching pattern ck --exclude "build/" --exclude "*.log" . # Multiple exclusions # Note: Patterns are relative to the search root ``` **Why .ckignore?** While `.gitignore` handles version control exclusions, many files that *should* be in your repo aren't ideal for semantic search. Config files (`package.json`, `tsconfig.json`), images, videos, and data files add noise to search results and slow down indexing. `.ckignore` lets you focus semantic search on actual code while keeping everything else in git. Think of it as "what should I search" vs "what should I commit". ## ๐Ÿ›  Advanced Usage ### AI Agent Integration #### MCP Server (Recommended) ```python # Example usage in AI agents response = await client.call_tool("semantic_search", { "query": "authentication logic", "path": "/path/to/code", "page_size": 25, "top_k": 50, # Limit total results (default: 100 for MCP) "snippet_length": 200 }) # Handle pagination if response["pagination"]["next_cursor"]: next_response = await client.call_tool("semantic_search", { "query": "authentication logic", "path": "/path/to/code", "cursor": response["pagination"]["next_cursor"] }) ``` #### JSONL Output (Custom Workflows) Perfect structured output for LLMs, scripts, and automation: ```bash # JSONL format - one JSON object per line (recommended for agents) ck --jsonl --sem "error handling" src/ ck --jsonl --no-snippet "function" . # Metadata only ck --jsonl --topk 5 --threshold 0.7 "auth" # High-confidence results # Traditional JSON (single array) ck --json --sem "error handling" src/ | jq '.file' ``` **Why JSONL for AI agents?** - โœ… **Streaming friendly**: Process results as they arrive - โœ… **Memory efficient**: Parse one result at a time - โœ… **Error resilient**: One malformed line doesn't break entire response - โœ… **Standard format**: Used by OpenAI API, Anthropic API, and modern ML pipelines ### Search & Filter Options ```bash # Threshold filtering ck --sem --threshold 0.7 "query" # Only high-confidence matches ck --hybrid --threshold 0.01 "concept" # Low-confidence (exploration) # Limit results ck --sem --topk 5 "authentication patterns" # Complete code sections ck --sem --full-section "database queries" # Complete functions ck --full-section "class.*Error" src/ # Complete classes (works with regex too) # Relevance scoring ck --sem --scores "machine learning" docs/ # [0.847] ./ai_guide.txt: Machine learning introduction... # [0.732] ./statistics.txt: Statistical learning methods... ``` ### Language Coverage | Language | Indexing | Chunking | AST-aware | Notes | |----------|----------|----------|-----------|-------| | Zig | โœ… | โœ… | โœ… | contributed by [@Nevon](https://github.com/Nevon) (PR #72) | ### Model Selection Choose the right embedding model for your needs: ```bash # Default: BGE-Small (fast, precise chunking) ck --index . # Mixedbread xsmall: Optimized for local semantic search (4K context, 384 dims) ck --index --model mxbai-xsmall . # Enhanced: Nomic V1.5 (8K context, optimal for large functions) ck --index --model nomic-v1.5 . # Code-specialized: Jina Code (optimized for programming languages) ck --index --model jina-code . ``` **Model Comparison:** - **`bge-small`** (default): 400-token chunks, fast indexing, good for most code - **`mxbai-xsmall`**: 4K context window, 384 dimensions, optimized for local inference (Mixedbread) - **`nomic-v1.5`**: 1024-token chunks with 8K model capacity, better for large functions - **`jina-code`**: 1024-token chunks with 8K model capacity, specialized for code understanding ### Index Management ```bash # Check index status ck --status . # Clean up and rebuild / switch models ck --clean . ck --switch-model mxbai-xsmall . ck --switch-model nomic-v1.5 . ck --switch-model nomic-v1.5 --force . # Force rebuild # Add single file to index ck --add new_file.rs # File inspection (analyze chunking and token usage) ck --inspect src/main.rs ck --inspect --model bge-small src/main.rs # Test different models ``` **Interrupting Operations:** Indexing can be safely interrupted with Ctrl+C. The partial index is saved, and the next operation will resume from where it stopped, only processing new or changed files. ## ๐Ÿ“š Language Support | Language | Indexing | Tree-sitter Parsing | Semantic Chunking | |----------|----------|-------------------|------------------| | Python | โœ… | โœ… | โœ… Functions, classes | | JavaScript/TypeScript | โœ… | โœ… | โœ… Functions, classes, methods | | Rust | โœ… | โœ… | โœ… Functions, structs, traits | | Go | โœ… | โœ… | โœ… Functions, types, methods | | Ruby | โœ… | โœ… | โœ… Classes, methods, modules | | Haskell | โœ… | โœ… | โœ… Functions, types, instances | | C# | โœ… | โœ… | โœ… Classes, interfaces, methods | | Dart | โœ… | โœ… | โœ… Classes, mixins, methods | **Text Formats:** Markdown, JSON, YAML, TOML, XML, HTML, CSS, shell scripts, SQL, log files, config files, and any other text format. **Smart Binary Detection:** Uses ripgrep-style content analysis, automatically indexing any text file while correctly excluding binary files. **Unsupported File Types:** Text files with unrecognized extensions (like `.org`, `.adoc`, etc.) are automatically indexed as plain text. ck detects text vs binary based on file contents, not extensions. ## ๐Ÿ— Installation ### From crates.io ```bash cargo install ck-search ``` ### From Source ```bash git clone https://github.com/BeaconBay/ck cd ck cargo install --path ck-cli ``` ### Package Managers ```bash # Currently available: cargo install ck-search # โœ… Available now via crates.io # Coming soon: brew install ck-search # ๐Ÿšง In development (use cargo for now) apt install ck-search # ๐Ÿšง In development ``` ## ๐Ÿ’ก Examples ### Finding Code Patterns ```bash # Find authentication/authorization code ck --sem "user permissions" src/ ck --sem "access control" src/ ck --sem "login validation" src/ # Find error handling strategies ck --sem "exception handling" src/ ck --sem "error recovery" src/ ck --sem "fallback mechanisms" src/ # Find performance-related code ck --sem "caching strategies" src/ ck --sem "database optimization" src/ ck --sem "memory management" src/ ``` ### Team Workflows ```bash # Find related test files ck --sem "unit tests for authentication" tests/ ck -l --sem "test" tests/ # List test files by semantic content # Identify refactoring candidates ck --sem "duplicate logic" src/ ck --sem "code complexity" src/ ck -L "test" src/ # Find source files without tests # Security audit ck --hybrid "password|credential|secret" src/ ck --sem "input validation" src/ ``` ### Integration Examples ```bash # Git hooks git diff --name-only | xargs ck --sem "TODO" # CI/CD pipeline ck --json --sem "security vulnerability" . | security_scanner.py # Code review prep ck --hybrid --scores "performance" src/ > review_notes.txt # Documentation generation ck --json --sem "public API" src/ | generate_docs.py ``` ## โšก Performance **Field-tested on real codebases:** - **Indexing:** ~1M LOC in under 2 minutes - **Incremental indexing:** 80-90% cache hit rate for typical code changes (only changed chunks re-embedded) - **Search:** Sub-500ms queries on typical codebases - **Index size:** ~2x source code size with compression - **Memory:** Efficient streaming for large repositories - **Token precision:** HuggingFace tokenizers for exact model-specific token counting ## ๐Ÿ”ง Architecture ck uses a modular Rust workspace: - **`ck-cli`** - Command-line interface and MCP server - **`ck-tui`** - Interactive terminal user interface (ratatui-based) - **`ck-core`** - Shared types, configuration, and utilities - **`ck-engine`** - Search engine implementations (regex, semantic, hybrid) - **`ck-index`** - File indexing, hashing, and sidecar management - **`ck-embed`** - Text embedding providers (FastEmbed, API backends) - **`ck-ann`** - Approximate nearest neighbor search indices - **`ck-chunk`** - Text segmentation and language-aware parsing ([query-based chunking](docs/explanation/query-based-chunking.md)) - **`ck-models`** - Model registry and configuration management ### Index Storage Indexes are stored in `.ck/` directories alongside your code: ``` project/ โ”œโ”€โ”€ src/ โ”œโ”€โ”€ docs/ โ””โ”€โ”€ .ck/ # Semantic index (can be safely deleted) โ”œโ”€โ”€ embeddings.json โ”œโ”€โ”€ ann_index.bin โ””โ”€โ”€ tantivy_index/ ``` The `.ck/` directory is a cache โ€” safe to delete and rebuild anytime. ## ๐Ÿงช Testing ```bash # Run the full test suite cargo test --workspace # Test with each feature combination cargo hack test --each-feature --workspace ``` ## ๐Ÿค Contributing ck is actively developed and welcomes contributions: 1. **Issues:** Report bugs, request features 2. **Code:** Submit PRs for bug fixes, new features 3. **Documentation:** Improve examples, guides, tutorials 4. **Testing:** Help test on different codebases and languages ### Development Setup ```bash git clone https://github.com/BeaconBay/ck cd ck cargo build --workspace cargo test --workspace ./target/debug/ck --index test_files/ ./target/debug/ck --sem "test query" test_files/ ``` ### CI Requirements Before submitting a PR, ensure your code passes all CI checks: ```bash # Format code (required) cargo fmt --all # Run clippy linter (required - must have no warnings) cargo clippy --workspace --all-features --all-targets -- -D warnings # Run tests (required) cargo test --workspace # Check minimum supported Rust version (MSRV) cargo hack check --each-feature --locked --rust-version --workspace ``` The CI pipeline runs on Ubuntu, Windows, and macOS to ensure cross-platform compatibility. ## ๐Ÿ—บ Roadmap ### Current (v0.7+) - โœ… MCP (Model Context Protocol) server for AI agent integration - โœ… Chunk-level incremental indexing with smart embedding reuse - โœ… grep-compatible CLI with semantic search and file listing flags - โœ… FastEmbed integration with BGE models and enhanced model selection - โœ… File exclusion patterns and glob support - โœ… Threshold filtering and relevance scoring with visual highlighting - โœ… Tree-sitter parsing and intelligent chunking for 7+ languages - โœ… Complete code section extraction (`--full-section`) - โœ… Clean stdout/stderr separation for reliable scripting - โœ… Token-aware chunking with HuggingFace tokenizers - โœ… Published to crates.io (`cargo install ck-search`) ### Next (v0.6+) - ๐Ÿšง Configuration file support - ๐Ÿšง Package manager distributions (brew, apt) - ๐Ÿšง Enhanced MCP tools (file writing, refactoring assistance) - ๐Ÿšง VS Code extension - ๐Ÿšง JetBrains plugin - ๐Ÿšง Additional language chunkers (Java, PHP, Swift) ## โ“ FAQ **Q: How is this different from grep/ripgrep/silver-searcher?** A: ck includes all the features of traditional search tools, but adds semantic understanding. Search for "error handling" and find relevant code even when those exact words aren't used. **Q: Does it work offline?** A: Yes, completely offline. The embedding model runs locally with no network calls. **Q: How big are the indexes?** A: Typically 1-3x the size of your source code. The `.ck/` directory can be safely deleted to reclaim space. **Q: Is it fast enough for large codebases?** A: Yes. The first semantic search builds the index automatically; after that only changed files are reprocessed, keeping searches sub-second even on large projects. **Q: Can I use it in scripts/automation?** A: Absolutely. The `--json` and `--jsonl` flags provide structured output perfect for automated processing and AI agent integration. **Q: What about privacy/security?** A: Everything runs locally. No code or queries are sent to external services. The embedding model is downloaded once and cached locally. **Q: Where are the embedding models cached?** A: Models are cached in platform-specific directories: - Linux/macOS: `~/.cache/ck/models/` - Windows: `%LOCALAPPDATA%\ck\cache\models\` - Fallback: `.ck_models/models/` in current directory ## ๐Ÿ“„ License Licensed under either of: - Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE)) - MIT License ([LICENSE-MIT](LICENSE-MIT)) at your option. ## ๐Ÿ™ Credits Built with: - [Rust](https://rust-lang.org) - Systems programming language - [FastEmbed](https://github.com/Anush008/fastembed-rs) - Fast text embeddings - [Tantivy](https://github.com/quickwit-oss/tantivy) - Full-text search engine - [clap](https://github.com/clap-rs/clap) - Command line argument parsing Inspired by the need for better code search tools in the age of AI-assisted development. --- **Start finding code by what it does, not what it says.** ```bash cargo install ck-search ck --sem "the code you're looking for" ```