Provenance API Data Dictionary
Repository Health Data Dictionary
Use this guide to understand every metric provided by the NetRise Provenance API. You can use these fields to create automated policies (e.g., in your CI/CD pipeline) to block or flag dependencies that don't meet your organization's risk tolerance.
Endpoint: /repo/health
Returns a comprehensive health snapshot for a source code repository. The response wraps a RepositoryHealth object inside RepositoryHealthResponse.
Top-level response:
| Field | Type | Description |
|---|---|---|
repo_url |
string | The repository URL that was queried. |
data |
RepositoryHealth | The health snapshot (see sub-objects below). |
Activity (data.activity)
Does the project have "signs of life"? Unmaintained code is a primary target for supply chain attacks.
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
last_commit_date |
string (datetime) | Timestamp of the most recent code change on the default branch. | Identify zombie projects: code that hasn't changed in >1 year is unlikely to receive security patches. |
commit_frequency |
CommitFrequency | Commit counts across rolling windows (see below). | Detect abandonment: a sudden frequency drop signals a project is losing its maintainers. |
commit_frequency.days_90 |
integer | Number of commits in the last 90 days. | Short-term pulse check. |
commit_frequency.days_180 |
integer | Number of commits in the last 180 days. | Medium-term trend; useful for seasonal projects. |
commit_frequency.days_365 |
integer | Number of commits in the last 365 days. | Gauge for maintenance health. |
last_release_date |
string (datetime) | Timestamp of the most recent official release. Null if the repository has never published a release. | Stale releases delay delivery of security fixes to downstream consumers. |
release_cadence_days |
number (float) | Average days between releases over trailing 12 months. Null if fewer than 2 releases. | Predictability: reliable projects release on a regular schedule. |
is_archived |
boolean | Official archived flag from GitHub. | Hard block: archived repositories are officially dead and will never receive patches. |
is_deprecated |
boolean | True if repository topics or README indicate unmaintained/deprecated status. | Soft block: projects signaling end-of-life should be replaced even if not yet archived. |
open_issues_count |
integer | Number of currently open issues. | A ballooning count may indicate maintainers can't keep up. |
issue_close_rate_180d |
number (0.0–1.0) | Ratio of issues closed to issues opened in the last 180 days. | Responsiveness: a high close rate suggests vulnerabilities will be triaged quickly. |
open_pr_count |
integer | Number of currently open pull requests. | Contribution health: many stale PRs suggest maintainers aren't reviewing code. |
pr_merge_rate_180d |
number (0.0–1.0) | Ratio of PRs merged to PRs opened in the last 180 days. | Merge velocity: slow merges delay both features and security fixes. |
has_readme |
boolean | True if README exists and is non-trivial (>100 bytes). | Quality signal: projects without documentation are higher-risk and harder to evaluate. |
has_changelog |
boolean | Presence of a CHANGELOG.md or equivalent. | Transparency: changelogs make it easier to audit what changed between releases. |
Popularity (data.popularity)
How widely used and trusted is the project?
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
star_count |
integer | GitHub stargazers count. | Popularity proxy: widely-starred projects tend to have more eyes on the code. |
fork_count |
integer | GitHub fork count. | Ecosystem signal: high forks indicate active derivative work and community investment. |
watcher_count |
integer | GitHub subscribers (watchers) count. | Engagement: watchers receive notifications, indicating sustained interest. |
dependent_repo_count |
integer | Number of other repositories that depend on this one. | Blast radius: high dependents mean a compromise has wide downstream impact. Also signals community trust. |
contributor_count |
integer | All-time distinct contributor count. | Community breadth: more contributors means more review and shared ownership. |
active_contributor_count_12mo |
integer | Contributors with commits in the last 12 months. | Current health: a project may have many historical contributors but no one active. |
Scorecard (data.scorecard)
Automated security heuristics based on industry best practices. Each check is scored 0–10. The entire object is null if the repository has not been scanned by OpenSSF Scorecard.
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
aggregate_score |
number (0.0–10.0) | Overall Scorecard score across all checks. | Top-level gate: set a minimum threshold (e.g., ≥5) for production dependencies. |
scan_date |
string (datetime) | When the scorecard was last computed. | Staleness: scores older than 90 days may not reflect current state. |
version |
string | Scorecard tool version used for the scan. | Comparability: score semantics can shift between scorecard versions. |
critical_checks_failed |
array of string | Names of checks rated High/Critical risk that scored below 5. | Quick filter: surface the most urgent issues without parsing every check. |
checks |
array of ScorecardCheck | Individual check results (see below). | Granular policy: set per-check thresholds based on your risk tolerance. |
ScorecardCheck object (data.scorecard.checks[]):
| Field | Type | Description |
|---|---|---|
name |
string | Check name (e.g., Code-Review, Branch-Protection). |
score |
integer (0–10) | Score for this check. |
reason |
string | Human-readable explanation of the score. |
risk |
string | Risk level: Critical, High, Medium, or Low. |
Available checks and their risk levels:
| Check | Risk | What It Measures |
|---|---|---|
| Binary-Artifacts | High | Whether compiled binaries are stored in source control. |
| Branch-Protection | High | Whether the default branch requires reviews/tests before merging. |
| CI-Tests | Low | Whether CI runs tests on pull requests. |
| CII-Best-Practices | Low | Whether the project has earned the OpenSSF Best Practices badge. |
| Code-Review | High | Whether human code review is required before merge. |
| Contributors | Low | Whether recent contributors come from multiple organizations. |
| Dangerous-Workflow | Critical | Whether GitHub Actions contain dangerous patterns (e.g., pull_request_target with untrusted code). |
| Dependency-Update-Tool | High | Whether Dependabot, Renovate, or similar tools are in use. |
| Fuzzing | Medium | Whether fuzz testing is in use (e.g., OSS-Fuzz). |
| License | Low | Whether a published license is detected. |
| Maintained | High | Composite signal of active maintenance (commits, issues, releases). |
| Packaging | Medium | Whether the project builds and publishes official packages. |
| Pinned-Dependencies | Medium | Whether dependencies are pinned to specific hashes or versions in CI. |
| SAST | Medium | Whether static analysis tools are in use. |
| Security-Policy | Medium | Whether a SECURITY.md or equivalent exists. |
| Signed-Releases | High | Whether release artifacts are cryptographically signed. |
| Token-Permissions | High | Whether GitHub Actions use least-privilege token permissions. |
| Vulnerabilities | High | Whether the repo has known unfixed vulnerabilities (via OSV). |
| Webhooks | Low | Whether webhooks use secret-based authentication. (Requires admin token for full check.) |
Security Config (data.security_config)
Direct settings and files that protect the integrity of the repository and its build pipeline.
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
security_md_exists |
boolean | Presence of a formal security reporting policy (SECURITY.md). | Compliance: essential for regulated industries to ensure responsible disclosure. |
has_security_advisories |
boolean | Whether the repo has published GitHub Security Advisories. | Transparency: projects that publish advisories demonstrate mature vulnerability handling. |
security_advisory_count |
integer | Number of published security advisories. | Track record: a high count with timely fixes shows healthy incident response. |
dependabot_alerts_enabled |
boolean | Whether Dependabot configuration or vulnerability alerts are active. | Hygiene: projects without automated dependency scanning miss known vulnerabilities. |
has_ci_workflows |
boolean | Presence of .github/workflows directory. | Build confidence: projects without CI are more likely to ship broken or untested code. |
Contributor Risk (data.contributor_risk)
Who is writing the code? Detects human risks in the supply chain — from key-person dependency to compromised accounts.
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
total_contributors |
integer | Distinct contributors who have ever committed. | Baseline: extremely low counts indicate a personal/hobby project. |
active_contributors_12mo |
integer | Distinct contributors with commits in the last 12 months. | Current health: a project may have many historical contributors but no one active. |
bus_factor |
integer | Minimum number of contributors responsible for ≥80% of recent commits. | Resilience: a bus factor of 1 means a single departure could kill the project. |
single_maintainer_risk |
boolean | True if bus factor = 1 or active contributors ≤ 1. | Quick-filter flag for the highest-risk key-person dependency. |
maintainer_org_diversity |
integer | Number of distinct organizations among top contributors. | Stability: projects backed by multiple companies survive if one cuts funding. |
new_maintainer_flag |
boolean | True if the top committer has less than 6 months tenure. | Repo-jacking signal: sudden leadership changes are a classic attack vector (cf. XZ Utils). |
signed_commit_ratio |
number (0.0–1.0) | Ratio of signed to total commits. | Identity assurance: high ratios confirm code provenance. |
contributors_with_breached_creds |
integer | Count of contributors with known breached credentials. | Account takeover risk: leaked passwords could be used to push malicious code. |
contributor_breach_ratio |
number (0.0–1.0) | Ratio of contributors with breached credentials to total contributors. | Normalized risk: a small project where 50% of contributors are breached is worse than a large project with 5%. |
most_recent_breach_date |
string (date) | Most recent breach date affecting any contributor. Null if no breaches. | Recency: recent breaches are more actionable than decade-old ones. |
unique_breach_sources |
array of string | Unique breach sources affecting repo contributors (e.g., "LinkedIn", "Adobe"). | Severity context: helps assess the nature and severity of credential exposure. |
maintainer_geo_distribution |
object (country→count) | Map of country name to contributor count (e.g., {"Finland": 3, "Germany": 2}). | Jurisdictional risk: concentrated geo may indicate regulatory or geopolitical exposure. |
Code Hygiene (data.code_hygiene)
Structural and legal attributes that affect build determinism, legal compliance, and code quality.
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
license_spdx |
string | SPDX license identifier. Null if no license detected. | Legal risk: block viral licenses (e.g., GPL) or non-commercial licenses that conflict with your business. |
has_lockfile |
boolean | Presence of package-lock.json, yarn.lock, go.sum, poetry.lock, Cargo.lock, etc. | Determinism: ensures what you audit is what you build. Missing lockfiles lead to unreproducible builds. |
has_gitignore |
boolean | Presence of a .gitignore file. | Hygiene: missing .gitignore risks committing secrets, build artifacts, or IDE config. |
repo_size_kb |
integer | Total disk usage of the repository in kilobytes. | Bloat detection: abnormally large repos may contain vendored binaries that should be audited. |
default_branch |
string | Name of the default branch (e.g., main, master). | Informational: useful for automation and branch-protection policy alignment. |
is_fork |
boolean | Whether the repository is a fork. | Provenance: forks may diverge from upstream security patches. |
parent_repo |
string | Parent repository full name (e.g., "original-org/repo"). Null if not a fork. | Enables automated drift analysis between fork and upstream. |
topics |
array of string | GitHub repository topics. | Classification: also used to derive the deprecated/unmaintained flag. |
Top-Level Metadata
| Field | Type | Description |
|---|---|---|
data.repo_health_last_checked |
string (datetime) | When the GitHub scraper last refreshed repository metadata for this health snapshot. Null if never looked up. |
data.metadata.compiled_at |
string (datetime) | When this response was assembled. |
data.metadata.attributions |
array of DataSourceAttribution | Data source attributions required by licensing agreements (see Cross-Cutting: Data Source Attributions). |
Endpoint: /repo
Returns package associations, repository details, and advisory attributions for a given repository URL.
Top-level response:
| Field | Type | Description |
|---|---|---|
repo |
string | The repository identifier that was queried. |
data |
RepositoryData | The repository data (see sub-objects below). |
Repository Info (data.repository_details)
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
created_at |
string (datetime) | When the repository was created on GitHub. | Age: very new repositories carrying critical functionality deserve extra scrutiny. |
updated_at |
string (datetime) | When the repository was last updated on GitHub. | Complements last_commit_date from the health endpoint with GitHub's own update signal. |
description |
string | Repository description from GitHub. | Context: useful for automated classification and human review. |
fork_count |
number | GitHub fork count. | (Also available in /repo/health popularity.) |
star_count |
number | GitHub star count. | (Also available in /repo/health popularity.) |
languages |
array of string | Top languages by size. | Context: language mix affects which vulnerability scanners and SAST tools apply. |
has_signed_commits |
boolean | True if any contributor has at least one signed commit to this repository. | Baseline: if no signed commits exist at all, the project has no cryptographic provenance. |
has_unsigned_commits |
boolean | True if any contributor has at least one unsigned commit. | Complement to has_signed_commits: both true means mixed signing discipline. |
health_available |
boolean | True if health data is available via /repo/health. |
Routing: use this to know whether to make a follow-up health call. |
Contributors (data.repository_details.contributors)
An array of contributor objects, one per contributor to the repository.
| Field | Type | Description |
|---|---|---|
email |
string | Contributor email address. |
has_signed_commits |
boolean | True if this contributor has at least one signed commit to this repository. |
has_unsigned_commits |
boolean | True if this contributor has at least one unsigned commit to this repository. |
Packages (data.packages)
An array of PackageAffiliation objects linking the repository to its published packages. Each object includes confidence scoring (see Cross-Cutting: Confidence Scoring).
| Field | Type | Description |
|---|---|---|
purl |
string | Package URL (PURL) of the associated package. |
confidence |
integer (0–100) | Confidence rating of this repo-to-package attribution. |
methods |
array of string | Method names that attributed this package to the repository. |
Advisories (data.advisories)
An array of AdvisoryAttribution objects linking the repository to known advisories.
| Field | Type | Description |
|---|---|---|
name |
string | Name of the advisory. |
relationship |
string | Whether the advisory is associated "direct"ly or "indirect"ly. |
Metadata
| Field | Type | Description |
|---|---|---|
data.metadata.compiled_at |
string (datetime) | When this response was assembled. |
data.metadata.attributions |
array of DataSourceAttribution | Data source attributions (see Cross-Cutting: Data Source Attributions). |
Endpoint: /contributor
Returns identity, organizational, geographic, and advisory information for a contributor, looked up by email or GitHub username.
Top-level response:
| Field | Type | Description |
|---|---|---|
email |
string | The email address queried (if applicable). |
username |
string | The GitHub username queried (if applicable). |
data |
ContributorData | The contributor data (see sub-objects below). |
Identity (data.identity)
| Field | Type | Description |
|---|---|---|
emails |
array of string | All known email addresses for this contributor. |
usernames |
array of string | All known GitHub usernames. |
declared_names |
array of string | Names declared across git commits and GitHub profiles. |
Summary (data.summary)
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
purls |
array of string | Package PURLs this contributor has been associated with. | Blast radius: understand how many packages a single contributor touches. |
repos_contributed_to |
array of object | Repositories this contributor has committed to. | Scope: identify contributors with broad access across many repos. |
Organizations (data.organizations)
An array of organization objects, each with confidence scoring.
| Field | Type | Description |
|---|---|---|
name |
string | The display name of the organization. |
repository_url |
string | The URL for the organization's repository. |
confidence |
integer (0–100) | Confidence rating of this organization-to-contributor attribution. |
methods |
array of string | Method names that attributed this organization to the contributor. |
Locations (data.locations)
An array of location objects, each with confidence scoring.
| Field | Type | Description |
|---|---|---|
country |
string | Country name. |
confidence |
integer (0–100) | Confidence rating of this location-to-contributor attribution. |
methods |
array of string | Method names that attributed this location to the contributor. |
Advisories (data.advisories)
An array of AdvisoryAttribution objects.
| Field | Type | Description |
|---|---|---|
name |
string | Name of the advisory. |
relationship |
string | Whether the advisory is associated "direct"ly or "indirect"ly. |
Other Fields
| Field | Type | Description |
|---|---|---|
data.security_available |
boolean | True if security data is available via /contributor/security. |
data.metadata.compiled_at |
string (datetime) | When this response was assembled. |
data.metadata.attributions |
array of DataSourceAttribution | Data source attributions (see Cross-Cutting: Data Source Attributions). |
Endpoint: /contributor/security
Returns breach exposure, signing key information, and commit signing ratios for a contributor.
Top-level response:
| Field | Type | Description |
|---|---|---|
email |
string | The email address queried (if applicable). |
username |
string | The GitHub username queried (if applicable). |
data |
ContributorSecurity | The security data (see fields below). |
Core Fields
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
has_breached_credentials |
boolean | Whether any of this contributor's emails appear in known data breaches. | Account takeover risk: breached contributors are higher priority for access review. |
has_password_exposure |
boolean | Whether any email was exposed in a breach containing passwords specifically. | Elevated risk: password exposure is significantly more dangerous than email-only leaks. |
signed_commit_ratio |
number (0.0–1.0) | Ratio of signed to total commits across all repositories. | Identity discipline: contributors who consistently sign are lower-risk. |
breach_last_refreshed_at |
string (datetime) | When the breach data was last refreshed. | Staleness: old breach data may miss recent exposures. |
Signing Key Info (data.signing_key_info)
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
has_signing_key |
boolean | Whether this contributor has a known signing key. | Baseline: contributors without keys cannot provide cryptographic provenance. |
key_age_days |
integer | Days since the current signing key was first seen. Null if no key. | Trust: brand-new keys on established accounts may warrant investigation. |
key_change_detected |
boolean | True if the contributor's signing key has changed. | Compromise signal: unexpected key rotation may indicate account takeover. |
key_changes |
array of KeyChange | History of key changes (see below). | Audit trail: review the timeline and frequency of key rotations. |
KeyChange object (data.signing_key_info.key_changes[]):
| Field | Type | Description |
|---|---|---|
old_key_id |
string | The previous signing key identifier. |
new_key_id |
string | The new signing key identifier. |
detected_at |
string (datetime) | When the key change was detected. |
Breach Details (data.breach_details)
An array of BreachDetail objects, one per email address.
| Field | Type | Description |
|---|---|---|
email |
string | The contributor email address. |
has_breach |
boolean | Whether this email has appeared in any known breaches. |
breach_count |
integer | Number of breaches this email appeared in. |
breaches |
array of BreachInfo | Individual breach records (see below). |
BreachInfo object (data.breach_details[].breaches[]):
| Field | Type | Description |
|---|---|---|
name |
string | Name of the breach (e.g., "LinkedIn"). |
date |
string (YYYY-MM-DD) | Date the breach occurred. |
is_verified |
boolean | Whether the breach has been verified by HIBP. |
exposed_data |
array of string | Types of data exposed (e.g., "Passwords", "Email addresses"). |
source |
string | Attribution source for the breach data. |
Metadata
| Field | Type | Description |
|---|---|---|
data.metadata.compiled_at |
string (datetime) | When this response was assembled. |
data.metadata.attributions |
array of DataSourceAttribution | Data source attributions (see Cross-Cutting: Data Source Attributions). |
Endpoint: /advisory
Returns details about a specific NetRise advisory, including which repositories, packages, and contributors are impacted.
Top-level response (AdvisoryResponse):
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
name |
string | Name/identifier of the advisory. | Reference: use for tracking and cross-referencing with other vulnerability databases. |
description |
string | Text description of the advisory. | Context: understand the nature and severity of the issue. |
urls |
array of string | Reference URLs with further information. | Research: link to original disclosures, CVE entries, or vendor patches. |
created_at |
string (datetime, ISO 8601) | When the advisory was added to Provenance. | Timeliness: compare against your SLA for vulnerability response. |
Impacted Entities
Each of the following fields is an AdvisoryAttributions object containing direct and indirect arrays:
| Field | Sub-field | Type | Description |
|---|---|---|---|
repositories |
direct |
array of string | Repositories directly impacted by the advisory (contain the vulnerable code). |
repositories |
indirect |
array of string | Repositories indirectly impacted (via dependencies). |
packages |
direct |
array of string | Packages directly impacted. |
packages |
indirect |
array of string | Packages indirectly impacted. |
contributor_emails |
direct |
array of string | Contributor emails directly associated with the advisory. |
contributor_emails |
indirect |
array of string | Contributor emails indirectly associated. |
contributor_usernames |
direct |
array of string | GitHub usernames directly associated. |
contributor_usernames |
indirect |
array of string | GitHub usernames indirectly associated. |
Metadata
| Field | Type | Description |
|---|---|---|
metadata.compiled_at |
string (datetime) | When this response was assembled. |
metadata.attributions |
array of DataSourceAttribution | Data source attributions (see Cross-Cutting: Data Source Attributions). |
Endpoint: /package
Returns detailed metadata for a specific package identified by a full PURL (all 6 fields required: type, namespace, name, version, arch, distro).
Top-level response:
| Field | Type | Description |
|---|---|---|
purl |
string | The PURL that was queried. |
data |
PackageData | The package data (see fields below). |
Package Fields
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
package_type |
string | Package format (deb, rpm, apk). | Classification: determines which distro-specific policies apply. |
vendor |
string | Package vendor/namespace. | Provenance: confirms the distribution source. |
product |
string | Package name. | Identification. |
version |
string | Full version string. | Version pinning and range checks. |
arch |
string | CPU architecture. | Build matrix: ensure you're evaluating the correct architecture. |
distro |
string | OS distribution and version. | Environment match: confirm the package applies to your target OS. |
package_details |
object | Additional package-specific metadata. | Extended attributes from the package registry. |
repository_details |
RepositoryInfo (nullable) | Source repository information, if attributed. | Link to source: use to cross-reference with /repo/health. |
advisories |
array of AdvisoryAttribution | Advisories impacting this package (each with name and relationship). |
Vulnerability exposure: shows which advisories affect this package. |
dependencies |
array of object | Dependencies of this package. | Dependency tree: input for transitive risk analysis. |
Metadata
| Field | Type | Description |
|---|---|---|
data.metadata.compiled_at |
string (datetime) | When this response was assembled. |
data.metadata.attributions |
array of DataSourceAttribution | Data source attributions (see Cross-Cutting: Data Source Attributions). |
Endpoint: /package/search
Searches for packages matching partial PURL criteria (type, namespace, name required; version with operators, arch, and distro optional).
Response (PackageSearchResponse):
| Field | Type | Description |
|---|---|---|
purls |
array of string | Array of PURLs matching the search criteria. |
Endpoint: /package/dependents
Returns reverse dependencies — other packages that depend on the queried package (full PURL required).
Response (PackageDependentResponse):
| Field | Type | Description | Policy Guidance |
|---|---|---|---|
purls |
array of string | Array of PURLs that depend on the queried package. | Blast radius: understand how many packages are affected if this one is compromised. |
Cross-Cutting: Confidence Scoring
Many responses include an AttributionMetadata object for fields where attribution is inferred rather than directly observed.
| Field | Type | Description |
|---|---|---|
confidence |
integer (0–100) | Confidence rating of this attribution as a percentage. |
methods |
array of string | List of method names that attributed this quality to the entity. |
Attribution kinds and their methods:
| Kind | Methods | Description |
|---|---|---|
repo_to_package |
file_association, readme_github_search, tarball_url | Links a repository to a package. |
organization_to_contributor |
email_domain, github_profile, repo_member | Links an organization to a contributor. |
location_to_contributor |
email_country_code, github_profile, organization_location, website_country_code, llm_geolocation | Links a location to a contributor. |
Confidence is calculated using a naive Bayes–style approach: each candidate starts with a uniform prior and is updated multiplicatively by every supporting method, weighted by that method's reliability score. When multiple methods agree, their contributions compound. The top candidate's normalized score is returned as the confidence percentage.
Cross-Cutting: AdvisoryAttribution
Used across /repo, /contributor, and /package responses to link entities to advisories.
| Field | Type | Description |
|---|---|---|
name |
string | Name of the advisory. |
relationship |
string | "direct" if the entity is directly impacted, "indirect" if impacted via a dependency. |
Cross-Cutting: Data Source Attributions
Every response includes a metadata object.
| Field | Type | Description |
|---|---|---|
compiled_at |
string (datetime) | When this response was assembled. |
attributions |
array of DataSourceAttribution | Data source attributions required by licensing agreements. |
DataSourceAttribution object:
| Field | Type | Description |
|---|---|---|
provider |
string | Name of the data provider. |
url |
string | URL for the data provider. |
license |
string | License under which the data is provided. |
reason |
string | Reason for including this attribution. |