# tap-github-search **Repository Path**: mirrors_Automattic/tap-github-search ## Basic Information - **Project Name**: tap-github-search - **Description**: A Singer tap for extracting data from Github. Powered by the Meltano SDK for Singer Taps: https://sdk.meltano.com that adds GitHUb GraphQL Search - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-22 - **Last Updated**: 2026-02-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # tap-github `tap-github` is a Singer tap for GitHub. Built with the [Singer SDK](https://gitlab.com/meltano/singer-sdk). ## Installation ```bash # use uv (https://docs.astral.sh/uv/) uv tool install meltanolabs-tap-github # or pipx (https://pipx.pypa.io/stable/) pipx install meltanolabs-tap-github # or Meltano meltano add extractor tap-github ``` A list of release versions is available at https://github.com/MeltanoLabs/tap-github/releases ## Configuration ### Accepted Config Options This tap accepts the following configuration options: - Required: One and only one of the following modes: 1. `repositories`: An array of strings specifying the GitHub repositories to be included. Each element of the array should be of the form `/`, e.g. `MeltanoLabs/tap-github`. 2. `organizations`: An array of strings containing the github organizations to be included 3. `searches`: An array of search descriptor objects with the following properties: - `name`: A human readable name for the search query - `query`: A github search string (generally the same as would come after `?q=` in the URL) 4. `user_usernames`: A list of github usernames 5. `user_ids`: A list of github user ids [int] - Highly recommended: - Personal access tokens (PATs) for authentication can be provided in 3 ways: - `auth_token` - Takes a single token. - `additional_auth_tokens` - Takes a list of tokens. Can be used together with `auth_token` or as the sole source of PATs. - Any environment variables beginning with `GITHUB_TOKEN` will be assumed to be PATs. These tokens will be used in addition to `auth_token` (if provided), but will not be used if `additional_auth_tokens` is provided. - GitHub App keys are another option for authentication, and can be used in combination with PATs if desired. App IDs and keys should be assembled into the format `:app_id:;;-----BEGIN RSA PRIVATE KEY-----\n_YOUR_P_KEY_\n-----END RSA PRIVATE KEY-----` where the key can be generated from the `Private keys` section on https://github.com/organizations/:organization_name/settings/apps/:app_name. Read more about GitHub App quotas [here](https://docs.github.com/en/enterprise-server@3.3/developers/apps/building-github-apps/rate-limits-for-github-apps#server-to-server-requests). Formatted app keys can be provided in 2 ways: - `auth_app_keys` - List of GitHub App keys in the prescribed format. - If `auth_app_keys` is not provided but there is an environment variable with the name `GITHUB_APP_PRIVATE_KEY`, it will be assumed to be an App key in the prescribed format. - Optional: - `user_agent` - `start_date` - `metrics_log_level` - `stream_maps` - `stream_maps_config` - `stream_options`: Options which can change the behaviour of a specific stream are nested within. - `milestones`: Valid options for the `milestones` stream are nested within. - `state`: Determines which milestones will be extracted. One of `open` (default), `closed`, `all`. - `rate_limit_buffer`: A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000. - `expiry_time_buffer`: A buffer used when determining when to refresh GitHub app tokens. Only relevant when authenticating as a GitHub app. Defaults to 10 minutes. Tokens generated by GitHub apps expire 1 hour after creation, and will be refreshed once fewer than `expiry_time_buffer` minutes remain until the anticipated expiry time. Note that modes 1-3 are `repository` modes and 4-5 are `user` modes and will not run the same set of streams. A full list of supported settings and capabilities for this tap is available by running: ```bash tap-github --about ``` ### Source Authentication and Authorization A small number of records may be pulled without an auth token. However, a Github auth token should generally be considered "required" since it gives more realistic rate limits. (See GitHub API docs for more info.) ## Usage ### API Limitation - Pagination The GitHub API is limited for some resources such as `/events`. For some resources, users might encounter the following error: ``` In order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse. ``` To avoid this, the GitHub streams will exit early. I.e. when there are no more `next page` available. If you are fecthing `/events` at the repository level, beware of letting the tap disabled for longer than a few days or you will have gaps in your data. You can easily run `tap-github` by itself or in a pipeline using [Meltano](www.meltano.com). ### Notes regarding permissions * For the `traffic_*` streams, [you will need write access to the repository](https://docs.github.com/en/rest/metrics/traffic?apiVersion=2022-11-28). You can enable extraction for these streams by [selecting them in the catalog](https://hub.meltano.com/singer/spec/#metadata). ### Executing the Tap Directly ```bash tap-github --version tap-github --help tap-github --config CONFIG --discover > ./catalog.json ``` ## Contributing This project uses parent-child streams. Learn more about them [here.](https://gitlab.com/meltano/sdk/-/blob/main/docs/parent_streams.md) ### Initialize your Development Environment ```bash pipx install poetry poetry install ``` ### Create and Run Tests Create tests within the `tap_github/tests` subfolder and then run: ```bash poetry run pytest ``` You can also test the `tap-github` CLI interface directly using `poetry run`: ```bash poetry run tap-github --help ``` ### `constraints.txt` generation: ```bash $ poetry self add poetry-plugin-export $ poetry export --output constraints.txt --without-hashes ``` ### Testing with [Meltano](meltano.com) _**Note:** This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios._ Your project comes with a custom `meltano.yml` project file already created. Open the `meltano.yml` and follow any _"TODO"_ items listed in the file. Next, install Meltano (if you haven't already) and any needed plugins: ```bash # Install meltano pipx install meltano # Initialize meltano within this directory cd tap-github meltano install ``` Now you can test and orchestrate using Meltano: ```bash # Test invocation: meltano invoke tap-github --version # OR run a test `elt` pipeline: meltano elt tap-github target-jsonl ``` One-liner to recreate output directory, run elt, and write out state file: ```bash # Update this when you want a fresh state file: TESTJOB=testjob1 # Run everything in one line mkdir -p .output && meltano elt tap-github target-jsonl --job_id $TESTJOB && meltano elt tap-github target-jsonl --job_id $TESTJOB --dump=state > .output/state.json ``` ### Singer SDK Dev Guide See the [dev guide](../../docs/dev_guide.md) for more instructions on how to use the Singer SDK to develop your own taps and targets.