# insights

**Repository Path**: cncf/insights

## Basic Information

- **Project Name**: insights
- **Description**: This is an engine that sources insights about project health from the github archive
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-16
- **Last Updated**: 2025-05-06

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# insights
This is an engine that sources insights about project health from the github archive.

[![Coverage](https://img.shields.io/badge/coverage-21.21%25-red)](coverage/index.html)

## Quick Start

To get started with the insights tool:

1. Ensure you have Ruby installed (2.7 or higher recommended)
2. Clone this repository
3. Install dependencies:
   ```
   bundle install
   ```
4. Set up your GitHub API token (required for API access):
   ```
   export GITHUB_TOKEN=your_github_personal_access_token
   ```
5. Run the tool with an example repository:
   ```
   ruby build.rb -r etcd-io/etcd
   ```
   This will generate a report for the etcd-io/etcd repository for the last 30 days (default).
   The data will be stored in `repo.db` by default.
6. View the generated report in the `reports` directory:
   ```
   open reports/etcd-io-etcd-lottery.html
   ```

## Project Structure

```
build.rb                # Main script to build reports
Gemfile                 # Ruby dependencies
repo.db                 # SQLite database for storing repository data
lib/
  data_fetcher.rb       # Fetches data from GitHub API
  database.rb           # Database operations
  github_client.rb      # GitHub API client
  html_generator.rb     # Generates HTML reports
  lottery_factor_tool.rb # Calculate lottery factor metrics
reports/                # Generated HTML reports
```

## Plan

Working with [@jpmcb](https://github.com/jpmcb) to prototype a tool that sources project health insights from a selection of CNCF projects. This is still a WIP but we will follow this path:

- [x] POC for sourcing archive for a selection of repositories (probably a yml list)
- [x] Store results in ~~postgres DB or duckdb~~ sqlite
- [x] Generate [lottery factor](https://opensauced.pizza/changelog/repository-pages-lottery-factor) as the first insight.
- [ ] Nice to haves or stretch goal: [Contributor Confidence](https://opensauced.pizza/changelog/contributor-confidence), [YOLO coder](https://opensauced.pizza/changelog/yolo-coders-in-repository-pages), and [Outside Contributor](https://opensauced.pizza/changelog/contributor-filters).

## Testing

The project uses MiniTest for testing with mock and stub capabilities to ensure API interactions can be properly tested without making actual API calls.

### Running Tests

To run the test suite:

```bash
# Run all tests
bundle exec rake test

# Run tests with coverage report
bundle exec rake coverage

# Run a single test file
bundle exec ruby -Ilib:test test/path/to/test_file.rb

# Run a specific test within a file
bundle exec ruby -Ilib:test test/path/to/test_file.rb -n test_method_name
```

### Test Coverage

Test coverage is tracked using SimpleCov. After running the tests with coverage enabled, you can:

1. View the detailed HTML coverage report:
   ```bash
   open coverage/index.html
   ```

2. Generate a coverage badge for your README:
   ```bash
   bundle exec rake coverage_badge
   ```

3. See the coverage breakdown by file in the terminal:
   ```bash
   bundle exec rake coverage
   ```

### Continuous Integration

Tests automatically run on GitHub Actions for all pull requests to ensure code quality. The workflow:
- Runs all tests
- Checks code style with RuboCop
- Generates and reports test coverage metrics

## Available Insights

### Lottery Factor
The lottery factor indicates how dependent a project is on a small number of contributors. A high lottery factor suggests that if those key contributors left (won the lottery), the project might struggle.

## Usage

```
ruby build.rb -r <owner/repo> [options]
```

Required parameters:
- `-r, --repo REPO`: GitHub repository in the format 'owner/repo' (e.g., etcd-io/etcd)

Optional parameters:
- `-d, --database FILENAME`: SQLite database filename (defaults to repo.db)
- `-t, --time-range DAYS`: Time range in days to analyze (defaults to 30)
- `-o, --output FILENAME`: Output HTML filename (default: owner-repo-lottery.html)
- `-f, --force`: Force fetching new data even if recent data exists
- `--top-display COUNT`: Number of top contributors to display individually (default: 6)

Example:
```
ruby build.rb -r etcd-io/etcd
ruby build.rb -r etcd-io/etcd -f  # Force fetch new data
```

## Debugging

### SQLite Database Inspection

To debug issues with contributor data, you can inspect the SQLite database directly using the following commands:

```bash
# Open the SQLite database
sqlite3 repo.db

# List all tables
.tables

# Show schema for pull_requests table
.schema pull_requests

# Check recent pull requests (last 30 days)
SELECT pr.author, COUNT(*) as pr_count
FROM pull_requests pr
JOIN repositories r ON pr.repository_id = r.id
WHERE pr.merged_at >= datetime('now', '-30 days')
GROUP BY pr.author
ORDER BY pr_count DESC;

# View raw pull request data with dates
SELECT pr.author, pr.merged_at
FROM pull_requests pr
JOIN repositories r ON pr.repository_id = r.id
ORDER BY pr.merged_at DESC
LIMIT 10;

# Exit SQLite
.quit
```

Common issues to check:
- Verify pull request data exists in the database
- Check if dates are stored in correct ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ)
- Confirm the repository owner and name are correct in the repositories table
- Ensure pull requests have proper merged_at timestamps

If no data appears in the HTML report but exists in the database, check:
1. The date filtering in the database queries
2. The data transformation in the HTML generator
3. The GitHub API token permissions and rate limits