Skip to content
English
  • There are no suggestions because the search field is empty.

Provenance API  Data Dictionary

Repository Health Data Dictionary

Use this guide to understand every metric provided by the NetRise Provenance API. You can use these fields to create automated policies (e.g., in your CI/CD pipeline) to block or flag dependencies that don't meet your organization's risk tolerance.


Endpoint: /repo/health

Returns a comprehensive health snapshot for a source code repository. The response wraps a RepositoryHealth object inside RepositoryHealthResponse.

Top-level response:

Field Type Description
repo_url string The repository URL that was queried.
data RepositoryHealth The health snapshot (see sub-objects below).

Activity (data.activity)

Does the project have "signs of life"? Unmaintained code is a primary target for supply chain attacks.

Field Type Description Policy Guidance
last_commit_date string (datetime) Timestamp of the most recent code change on the default branch. Identify zombie projects: code that hasn't changed in >1 year is unlikely to receive security patches.
commit_frequency CommitFrequency Commit counts across rolling windows (see below). Detect abandonment: a sudden frequency drop signals a project is losing its maintainers.
commit_frequency.days_90 integer Number of commits in the last 90 days. Short-term pulse check.
commit_frequency.days_180 integer Number of commits in the last 180 days. Medium-term trend; useful for seasonal projects.
commit_frequency.days_365 integer Number of commits in the last 365 days. Gauge for maintenance health.
last_release_date string (datetime) Timestamp of the most recent official release. Null if the repository has never published a release. Stale releases delay delivery of security fixes to downstream consumers.
release_cadence_days number (float) Average days between releases over trailing 12 months. Null if fewer than 2 releases. Predictability: reliable projects release on a regular schedule.
is_archived boolean Official archived flag from GitHub. Hard block: archived repositories are officially dead and will never receive patches.
is_deprecated boolean True if repository topics or README indicate unmaintained/deprecated status. Soft block: projects signaling end-of-life should be replaced even if not yet archived.
open_issues_count integer Number of currently open issues. A ballooning count may indicate maintainers can't keep up.
issue_close_rate_180d number (0.0–1.0) Ratio of issues closed to issues opened in the last 180 days. Responsiveness: a high close rate suggests vulnerabilities will be triaged quickly.
open_pr_count integer Number of currently open pull requests. Contribution health: many stale PRs suggest maintainers aren't reviewing code.
pr_merge_rate_180d number (0.0–1.0) Ratio of PRs merged to PRs opened in the last 180 days. Merge velocity: slow merges delay both features and security fixes.
has_readme boolean True if README exists and is non-trivial (>100 bytes). Quality signal: projects without documentation are higher-risk and harder to evaluate.
has_changelog boolean Presence of a CHANGELOG.md or equivalent. Transparency: changelogs make it easier to audit what changed between releases.

Popularity (data.popularity)

How widely used and trusted is the project?

Field Type Description Policy Guidance
star_count integer GitHub stargazers count. Popularity proxy: widely-starred projects tend to have more eyes on the code.
fork_count integer GitHub fork count. Ecosystem signal: high forks indicate active derivative work and community investment.
watcher_count integer GitHub subscribers (watchers) count. Engagement: watchers receive notifications, indicating sustained interest.
dependent_repo_count integer Number of other repositories that depend on this one. Blast radius: high dependents mean a compromise has wide downstream impact. Also signals community trust.
contributor_count integer All-time distinct contributor count. Community breadth: more contributors means more review and shared ownership.
active_contributor_count_12mo integer Contributors with commits in the last 12 months. Current health: a project may have many historical contributors but no one active.

Scorecard (data.scorecard)

Automated security heuristics based on industry best practices. Each check is scored 0–10. The entire object is null if the repository has not been scanned by OpenSSF Scorecard.

Field Type Description Policy Guidance
aggregate_score number (0.0–10.0) Overall Scorecard score across all checks. Top-level gate: set a minimum threshold (e.g., ≥5) for production dependencies.
scan_date string (datetime) When the scorecard was last computed. Staleness: scores older than 90 days may not reflect current state.
version string Scorecard tool version used for the scan. Comparability: score semantics can shift between scorecard versions.
critical_checks_failed array of string Names of checks rated High/Critical risk that scored below 5. Quick filter: surface the most urgent issues without parsing every check.
checks array of ScorecardCheck Individual check results (see below). Granular policy: set per-check thresholds based on your risk tolerance.

ScorecardCheck object (data.scorecard.checks[]):

Field Type Description
name string Check name (e.g., Code-Review, Branch-Protection).
score integer (0–10) Score for this check.
reason string Human-readable explanation of the score.
risk string Risk level: Critical, High, Medium, or Low.

Available checks and their risk levels:

Check Risk What It Measures
Binary-Artifacts High Whether compiled binaries are stored in source control.
Branch-Protection High Whether the default branch requires reviews/tests before merging.
CI-Tests Low Whether CI runs tests on pull requests.
CII-Best-Practices Low Whether the project has earned the OpenSSF Best Practices badge.
Code-Review High Whether human code review is required before merge.
Contributors Low Whether recent contributors come from multiple organizations.
Dangerous-Workflow Critical Whether GitHub Actions contain dangerous patterns (e.g., pull_request_target with untrusted code).
Dependency-Update-Tool High Whether Dependabot, Renovate, or similar tools are in use.
Fuzzing Medium Whether fuzz testing is in use (e.g., OSS-Fuzz).
License Low Whether a published license is detected.
Maintained High Composite signal of active maintenance (commits, issues, releases).
Packaging Medium Whether the project builds and publishes official packages.
Pinned-Dependencies Medium Whether dependencies are pinned to specific hashes or versions in CI.
SAST Medium Whether static analysis tools are in use.
Security-Policy Medium Whether a SECURITY.md or equivalent exists.
Signed-Releases High Whether release artifacts are cryptographically signed.
Token-Permissions High Whether GitHub Actions use least-privilege token permissions.
Vulnerabilities High Whether the repo has known unfixed vulnerabilities (via OSV).
Webhooks Low Whether webhooks use secret-based authentication. (Requires admin token for full check.)

Security Config (data.security_config)

Direct settings and files that protect the integrity of the repository and its build pipeline.

Field Type Description Policy Guidance
security_md_exists boolean Presence of a formal security reporting policy (SECURITY.md). Compliance: essential for regulated industries to ensure responsible disclosure.
has_security_advisories boolean Whether the repo has published GitHub Security Advisories. Transparency: projects that publish advisories demonstrate mature vulnerability handling.
security_advisory_count integer Number of published security advisories. Track record: a high count with timely fixes shows healthy incident response.
dependabot_alerts_enabled boolean Whether Dependabot configuration or vulnerability alerts are active. Hygiene: projects without automated dependency scanning miss known vulnerabilities.
has_ci_workflows boolean Presence of .github/workflows directory. Build confidence: projects without CI are more likely to ship broken or untested code.

Contributor Risk (data.contributor_risk)

Who is writing the code? Detects human risks in the supply chain — from key-person dependency to compromised accounts.

Field Type Description Policy Guidance
total_contributors integer Distinct contributors who have ever committed. Baseline: extremely low counts indicate a personal/hobby project.
active_contributors_12mo integer Distinct contributors with commits in the last 12 months. Current health: a project may have many historical contributors but no one active.
bus_factor integer Minimum number of contributors responsible for ≥80% of recent commits. Resilience: a bus factor of 1 means a single departure could kill the project.
single_maintainer_risk boolean True if bus factor = 1 or active contributors ≤ 1. Quick-filter flag for the highest-risk key-person dependency.
maintainer_org_diversity integer Number of distinct organizations among top contributors. Stability: projects backed by multiple companies survive if one cuts funding.
new_maintainer_flag boolean True if the top committer has less than 6 months tenure. Repo-jacking signal: sudden leadership changes are a classic attack vector (cf. XZ Utils).
signed_commit_ratio number (0.0–1.0) Ratio of signed to total commits. Identity assurance: high ratios confirm code provenance.
contributors_with_breached_creds integer Count of contributors with known breached credentials. Account takeover risk: leaked passwords could be used to push malicious code.
contributor_breach_ratio number (0.0–1.0) Ratio of contributors with breached credentials to total contributors. Normalized risk: a small project where 50% of contributors are breached is worse than a large project with 5%.
most_recent_breach_date string (date) Most recent breach date affecting any contributor. Null if no breaches. Recency: recent breaches are more actionable than decade-old ones.
unique_breach_sources array of string Unique breach sources affecting repo contributors (e.g., "LinkedIn", "Adobe"). Severity context: helps assess the nature and severity of credential exposure.
maintainer_geo_distribution object (country→count) Map of country name to contributor count (e.g., {"Finland": 3, "Germany": 2}). Jurisdictional risk: concentrated geo may indicate regulatory or geopolitical exposure.

Code Hygiene (data.code_hygiene)

Structural and legal attributes that affect build determinism, legal compliance, and code quality.

Field Type Description Policy Guidance
license_spdx string SPDX license identifier. Null if no license detected. Legal risk: block viral licenses (e.g., GPL) or non-commercial licenses that conflict with your business.
has_lockfile boolean Presence of package-lock.json, yarn.lock, go.sum, poetry.lock, Cargo.lock, etc. Determinism: ensures what you audit is what you build. Missing lockfiles lead to unreproducible builds.
has_gitignore boolean Presence of a .gitignore file. Hygiene: missing .gitignore risks committing secrets, build artifacts, or IDE config.
repo_size_kb integer Total disk usage of the repository in kilobytes. Bloat detection: abnormally large repos may contain vendored binaries that should be audited.
default_branch string Name of the default branch (e.g., main, master). Informational: useful for automation and branch-protection policy alignment.
is_fork boolean Whether the repository is a fork. Provenance: forks may diverge from upstream security patches.
parent_repo string Parent repository full name (e.g., "original-org/repo"). Null if not a fork. Enables automated drift analysis between fork and upstream.
topics array of string GitHub repository topics. Classification: also used to derive the deprecated/unmaintained flag.

Top-Level Metadata

Field Type Description
data.repo_health_last_checked string (datetime) When the GitHub scraper last refreshed repository metadata for this health snapshot. Null if never looked up.
data.metadata.compiled_at string (datetime) When this response was assembled.
data.metadata.attributions array of DataSourceAttribution Data source attributions required by licensing agreements (see Cross-Cutting: Data Source Attributions).

Endpoint: /repo

Returns package associations, repository details, and advisory attributions for a given repository URL.

Top-level response:

Field Type Description
repo string The repository identifier that was queried.
data RepositoryData The repository data (see sub-objects below).

Repository Info (data.repository_details)

Field Type Description Policy Guidance
created_at string (datetime) When the repository was created on GitHub. Age: very new repositories carrying critical functionality deserve extra scrutiny.
updated_at string (datetime) When the repository was last updated on GitHub. Complements last_commit_date from the health endpoint with GitHub's own update signal.
description string Repository description from GitHub. Context: useful for automated classification and human review.
fork_count number GitHub fork count. (Also available in /repo/health popularity.)
star_count number GitHub star count. (Also available in /repo/health popularity.)
languages array of string Top languages by size. Context: language mix affects which vulnerability scanners and SAST tools apply.
has_signed_commits boolean True if any contributor has at least one signed commit to this repository. Baseline: if no signed commits exist at all, the project has no cryptographic provenance.
has_unsigned_commits boolean True if any contributor has at least one unsigned commit. Complement to has_signed_commits: both true means mixed signing discipline.
health_available boolean True if health data is available via /repo/health. Routing: use this to know whether to make a follow-up health call.

Contributors (data.repository_details.contributors)

An array of contributor objects, one per contributor to the repository.

Field Type Description
email string Contributor email address.
has_signed_commits boolean True if this contributor has at least one signed commit to this repository.
has_unsigned_commits boolean True if this contributor has at least one unsigned commit to this repository.

Packages (data.packages)

An array of PackageAffiliation objects linking the repository to its published packages. Each object includes confidence scoring (see Cross-Cutting: Confidence Scoring).

Field Type Description
purl string Package URL (PURL) of the associated package.
confidence integer (0–100) Confidence rating of this repo-to-package attribution.
methods array of string Method names that attributed this package to the repository.

Advisories (data.advisories)

An array of AdvisoryAttribution objects linking the repository to known advisories.

Field Type Description
name string Name of the advisory.
relationship string Whether the advisory is associated "direct"ly or "indirect"ly.

Metadata

Field Type Description
data.metadata.compiled_at string (datetime) When this response was assembled.
data.metadata.attributions array of DataSourceAttribution Data source attributions (see Cross-Cutting: Data Source Attributions).

Endpoint: /contributor

Returns identity, organizational, geographic, and advisory information for a contributor, looked up by email or GitHub username.

Top-level response:

Field Type Description
email string The email address queried (if applicable).
username string The GitHub username queried (if applicable).
data ContributorData The contributor data (see sub-objects below).

Identity (data.identity)

Field Type Description
emails array of string All known email addresses for this contributor.
usernames array of string All known GitHub usernames.
declared_names array of string Names declared across git commits and GitHub profiles.

Summary (data.summary)

Field Type Description Policy Guidance
purls array of string Package PURLs this contributor has been associated with. Blast radius: understand how many packages a single contributor touches.
repos_contributed_to array of object Repositories this contributor has committed to. Scope: identify contributors with broad access across many repos.

Organizations (data.organizations)

An array of organization objects, each with confidence scoring.

Field Type Description
name string The display name of the organization.
repository_url string The URL for the organization's repository.
confidence integer (0–100) Confidence rating of this organization-to-contributor attribution.
methods array of string Method names that attributed this organization to the contributor.

Locations (data.locations)

An array of location objects, each with confidence scoring.

Field Type Description
country string Country name.
confidence integer (0–100) Confidence rating of this location-to-contributor attribution.
methods array of string Method names that attributed this location to the contributor.

Advisories (data.advisories)

An array of AdvisoryAttribution objects.

Field Type Description
name string Name of the advisory.
relationship string Whether the advisory is associated "direct"ly or "indirect"ly.

Other Fields

Field Type Description
data.security_available boolean True if security data is available via /contributor/security.
data.metadata.compiled_at string (datetime) When this response was assembled.
data.metadata.attributions array of DataSourceAttribution Data source attributions (see Cross-Cutting: Data Source Attributions).

Endpoint: /contributor/security

Returns breach exposure, signing key information, and commit signing ratios for a contributor.

Top-level response:

Field Type Description
email string The email address queried (if applicable).
username string The GitHub username queried (if applicable).
data ContributorSecurity The security data (see fields below).

Core Fields

Field Type Description Policy Guidance
has_breached_credentials boolean Whether any of this contributor's emails appear in known data breaches. Account takeover risk: breached contributors are higher priority for access review.
has_password_exposure boolean Whether any email was exposed in a breach containing passwords specifically. Elevated risk: password exposure is significantly more dangerous than email-only leaks.
signed_commit_ratio number (0.0–1.0) Ratio of signed to total commits across all repositories. Identity discipline: contributors who consistently sign are lower-risk.
breach_last_refreshed_at string (datetime) When the breach data was last refreshed. Staleness: old breach data may miss recent exposures.

Signing Key Info (data.signing_key_info)

Field Type Description Policy Guidance
has_signing_key boolean Whether this contributor has a known signing key. Baseline: contributors without keys cannot provide cryptographic provenance.
key_age_days integer Days since the current signing key was first seen. Null if no key. Trust: brand-new keys on established accounts may warrant investigation.
key_change_detected boolean True if the contributor's signing key has changed. Compromise signal: unexpected key rotation may indicate account takeover.
key_changes array of KeyChange History of key changes (see below). Audit trail: review the timeline and frequency of key rotations.

KeyChange object (data.signing_key_info.key_changes[]):

Field Type Description
old_key_id string The previous signing key identifier.
new_key_id string The new signing key identifier.
detected_at string (datetime) When the key change was detected.

Breach Details (data.breach_details)

An array of BreachDetail objects, one per email address.

Field Type Description
email string The contributor email address.
has_breach boolean Whether this email has appeared in any known breaches.
breach_count integer Number of breaches this email appeared in.
breaches array of BreachInfo Individual breach records (see below).

BreachInfo object (data.breach_details[].breaches[]):

Field Type Description
name string Name of the breach (e.g., "LinkedIn").
date string (YYYY-MM-DD) Date the breach occurred.
is_verified boolean Whether the breach has been verified by HIBP.
exposed_data array of string Types of data exposed (e.g., "Passwords", "Email addresses").
source string Attribution source for the breach data.

Metadata

Field Type Description
data.metadata.compiled_at string (datetime) When this response was assembled.
data.metadata.attributions array of DataSourceAttribution Data source attributions (see Cross-Cutting: Data Source Attributions).

Endpoint: /advisory

Returns details about a specific NetRise advisory, including which repositories, packages, and contributors are impacted.

Top-level response (AdvisoryResponse):

Field Type Description Policy Guidance
name string Name/identifier of the advisory. Reference: use for tracking and cross-referencing with other vulnerability databases.
description string Text description of the advisory. Context: understand the nature and severity of the issue.
urls array of string Reference URLs with further information. Research: link to original disclosures, CVE entries, or vendor patches.
created_at string (datetime, ISO 8601) When the advisory was added to Provenance. Timeliness: compare against your SLA for vulnerability response.

Impacted Entities

Each of the following fields is an AdvisoryAttributions object containing direct and indirect arrays:

Field Sub-field Type Description
repositories direct array of string Repositories directly impacted by the advisory (contain the vulnerable code).
repositories indirect array of string Repositories indirectly impacted (via dependencies).
packages direct array of string Packages directly impacted.
packages indirect array of string Packages indirectly impacted.
contributor_emails direct array of string Contributor emails directly associated with the advisory.
contributor_emails indirect array of string Contributor emails indirectly associated.
contributor_usernames direct array of string GitHub usernames directly associated.
contributor_usernames indirect array of string GitHub usernames indirectly associated.

Metadata

Field Type Description
metadata.compiled_at string (datetime) When this response was assembled.
metadata.attributions array of DataSourceAttribution Data source attributions (see Cross-Cutting: Data Source Attributions).

Endpoint: /package

Returns detailed metadata for a specific package identified by a full PURL (all 6 fields required: type, namespace, name, version, arch, distro).

Top-level response:

Field Type Description
purl string The PURL that was queried.
data PackageData The package data (see fields below).

Package Fields

Field Type Description Policy Guidance
package_type string Package format (deb, rpm, apk). Classification: determines which distro-specific policies apply.
vendor string Package vendor/namespace. Provenance: confirms the distribution source.
product string Package name. Identification.
version string Full version string. Version pinning and range checks.
arch string CPU architecture. Build matrix: ensure you're evaluating the correct architecture.
distro string OS distribution and version. Environment match: confirm the package applies to your target OS.
package_details object Additional package-specific metadata. Extended attributes from the package registry.
repository_details RepositoryInfo (nullable) Source repository information, if attributed. Link to source: use to cross-reference with /repo/health.
advisories array of AdvisoryAttribution Advisories impacting this package (each with name and relationship). Vulnerability exposure: shows which advisories affect this package.
dependencies array of object Dependencies of this package. Dependency tree: input for transitive risk analysis.

Metadata

Field Type Description
data.metadata.compiled_at string (datetime) When this response was assembled.
data.metadata.attributions array of DataSourceAttribution Data source attributions (see Cross-Cutting: Data Source Attributions).

Endpoint: /package/search

Searches for packages matching partial PURL criteria (type, namespace, name required; version with operators, arch, and distro optional).

Response (PackageSearchResponse):

Field Type Description
purls array of string Array of PURLs matching the search criteria.

Endpoint: /package/dependents

Returns reverse dependencies — other packages that depend on the queried package (full PURL required).

Response (PackageDependentResponse):

Field Type Description Policy Guidance
purls array of string Array of PURLs that depend on the queried package. Blast radius: understand how many packages are affected if this one is compromised.

Cross-Cutting: Confidence Scoring

Many responses include an AttributionMetadata object for fields where attribution is inferred rather than directly observed.

Field Type Description
confidence integer (0–100) Confidence rating of this attribution as a percentage.
methods array of string List of method names that attributed this quality to the entity.

Attribution kinds and their methods:

Kind Methods Description
repo_to_package file_association, readme_github_search, tarball_url Links a repository to a package.
organization_to_contributor email_domain, github_profile, repo_member Links an organization to a contributor.
location_to_contributor email_country_code, github_profile, organization_location, website_country_code, llm_geolocation Links a location to a contributor.

Confidence is calculated using a naive Bayes–style approach: each candidate starts with a uniform prior and is updated multiplicatively by every supporting method, weighted by that method's reliability score. When multiple methods agree, their contributions compound. The top candidate's normalized score is returned as the confidence percentage.


Cross-Cutting: AdvisoryAttribution

Used across /repo, /contributor, and /package responses to link entities to advisories.

Field Type Description
name string Name of the advisory.
relationship string "direct" if the entity is directly impacted, "indirect" if impacted via a dependency.

Cross-Cutting: Data Source Attributions

Every response includes a metadata object.

Field Type Description
compiled_at string (datetime) When this response was assembled.
attributions array of DataSourceAttribution Data source attributions required by licensing agreements.

DataSourceAttribution object:

Field Type Description
provider string Name of the data provider.
url string URL for the data provider.
license string License under which the data is provided.
reason string Reason for including this attribution.
Real person here 👉