# insights **Repository Path**: cncf/insights ## Basic Information - **Project Name**: insights - **Description**: This is an engine that sources insights about project health from the github archive - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-16 - **Last Updated**: 2025-05-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # insights This is an engine that sources insights about project health from the github archive. [![Coverage](https://img.shields.io/badge/coverage-21.21%25-red)](coverage/index.html) ## Quick Start To get started with the insights tool: 1. Ensure you have Ruby installed (2.7 or higher recommended) 2. Clone this repository 3. Install dependencies: ``` bundle install ``` 4. Set up your GitHub API token (required for API access): ``` export GITHUB_TOKEN=your_github_personal_access_token ``` 5. Run the tool with an example repository: ``` ruby build.rb -r etcd-io/etcd ``` This will generate a report for the etcd-io/etcd repository for the last 30 days (default). The data will be stored in `repo.db` by default. 6. View the generated report in the `reports` directory: ``` open reports/etcd-io-etcd-lottery.html ``` ## Project Structure ``` build.rb # Main script to build reports Gemfile # Ruby dependencies repo.db # SQLite database for storing repository data lib/ data_fetcher.rb # Fetches data from GitHub API database.rb # Database operations github_client.rb # GitHub API client html_generator.rb # Generates HTML reports lottery_factor_tool.rb # Calculate lottery factor metrics reports/ # Generated HTML reports ``` ## Plan Working with [@jpmcb](https://github.com/jpmcb) to prototype a tool that sources project health insights from a selection of CNCF projects. This is still a WIP but we will follow this path: - [x] POC for sourcing archive for a selection of repositories (probably a yml list) - [x] Store results in ~~postgres DB or duckdb~~ sqlite - [x] Generate [lottery factor](https://opensauced.pizza/changelog/repository-pages-lottery-factor) as the first insight. - [ ] Nice to haves or stretch goal: [Contributor Confidence](https://opensauced.pizza/changelog/contributor-confidence), [YOLO coder](https://opensauced.pizza/changelog/yolo-coders-in-repository-pages), and [Outside Contributor](https://opensauced.pizza/changelog/contributor-filters). ## Testing The project uses MiniTest for testing with mock and stub capabilities to ensure API interactions can be properly tested without making actual API calls. ### Running Tests To run the test suite: ```bash # Run all tests bundle exec rake test # Run tests with coverage report bundle exec rake coverage # Run a single test file bundle exec ruby -Ilib:test test/path/to/test_file.rb # Run a specific test within a file bundle exec ruby -Ilib:test test/path/to/test_file.rb -n test_method_name ``` ### Test Coverage Test coverage is tracked using SimpleCov. After running the tests with coverage enabled, you can: 1. View the detailed HTML coverage report: ```bash open coverage/index.html ``` 2. Generate a coverage badge for your README: ```bash bundle exec rake coverage_badge ``` 3. See the coverage breakdown by file in the terminal: ```bash bundle exec rake coverage ``` ### Continuous Integration Tests automatically run on GitHub Actions for all pull requests to ensure code quality. The workflow: - Runs all tests - Checks code style with RuboCop - Generates and reports test coverage metrics ## Available Insights ### Lottery Factor The lottery factor indicates how dependent a project is on a small number of contributors. A high lottery factor suggests that if those key contributors left (won the lottery), the project might struggle. ## Usage ``` ruby build.rb -r [options] ``` Required parameters: - `-r, --repo REPO`: GitHub repository in the format 'owner/repo' (e.g., etcd-io/etcd) Optional parameters: - `-d, --database FILENAME`: SQLite database filename (defaults to repo.db) - `-t, --time-range DAYS`: Time range in days to analyze (defaults to 30) - `-o, --output FILENAME`: Output HTML filename (default: owner-repo-lottery.html) - `-f, --force`: Force fetching new data even if recent data exists - `--top-display COUNT`: Number of top contributors to display individually (default: 6) Example: ``` ruby build.rb -r etcd-io/etcd ruby build.rb -r etcd-io/etcd -f # Force fetch new data ``` ## Debugging ### SQLite Database Inspection To debug issues with contributor data, you can inspect the SQLite database directly using the following commands: ```bash # Open the SQLite database sqlite3 repo.db # List all tables .tables # Show schema for pull_requests table .schema pull_requests # Check recent pull requests (last 30 days) SELECT pr.author, COUNT(*) as pr_count FROM pull_requests pr JOIN repositories r ON pr.repository_id = r.id WHERE pr.merged_at >= datetime('now', '-30 days') GROUP BY pr.author ORDER BY pr_count DESC; # View raw pull request data with dates SELECT pr.author, pr.merged_at FROM pull_requests pr JOIN repositories r ON pr.repository_id = r.id ORDER BY pr.merged_at DESC LIMIT 10; # Exit SQLite .quit ``` Common issues to check: - Verify pull request data exists in the database - Check if dates are stored in correct ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ) - Confirm the repository owner and name are correct in the repositories table - Ensure pull requests have proper merged_at timestamps If no data appears in the HTML report but exists in the database, check: 1. The date filtering in the database queries 2. The data transformation in the HTML generator 3. The GitHub API token permissions and rate limits