Skip to content

fix(crafter): use deterministic filename for AI config collector#2943

Open
waveywaves wants to merge 2 commits intochainloop-dev:mainfrom
waveywaves:fix/ai-config-deterministic-filename
Open

fix(crafter): use deterministic filename for AI config collector#2943
waveywaves wants to merge 2 commits intochainloop-dev:mainfrom
waveywaves:fix/ai-config-deterministic-filename

Conversation

@waveywaves
Copy link
Copy Markdown
Contributor

@waveywaves waveywaves commented Mar 27, 2026

Summary

  • Replace os.CreateTemp (random suffix) with deterministic filenames in both collector_aiagentconfig.go and collector_prmetadata.go, eliminating duplicate CAS uploads on attestation retries
  • Write temp files directly to os.TempDir() with deterministic names and defer os.Remove
  • Remove createPRMetadataTempFile helper — inlined at the call site with explanatory comment

Note: PR #2917 already fixed the primary root cause (non-deterministic content digest from captured_at timestamp). This PR provides additional hardening by making temp filenames deterministic, so that Artifact.Name (derived from os.Stat(filepath).Name()) is also stable across retries.

Root Cause Analysis

The reviewer questioned whether the filename is actually the cause of duplicates since Materials[m.Name] already uses a deterministic key. The answer is yes — the map key is deterministic, but the Artifact struct inside the material is not.

Here is the chain of causation:

  1. uploadAgentConfig() calls os.CreateTemp("", "ai-agent-config-claude-*.json"), which produces a random filename like /tmp/ai-agent-config-claude-3847291.json
  2. AddMaterialContractFree() calls addMaterial(), which calls uploadAndCraft()
  3. uploadAndCraft() calls fileStats(artifactPath), which does os.Stat(filepath).Name() to get the basename of the temp file
  4. That basename becomes Artifact.Name in the material proto (line 136 of materials.go)
  5. fileStats() also computes sha256(file_contents) which becomes Artifact.Digest

Because the JSON payload is identical across retries, Artifact.Digest stays the same. But Artifact.Name changes every time (random temp suffix), so the CAS sees a different resource name on each retry. The map key Materials["ai-agent-config-claude"] is overwritten (good), but each retry still triggers a new CAS upload because the artifact metadata differs.

The fix makes the filename deterministic by deriving it from a content hash, so Artifact.Name is stable across retries.

Changes

  • pkg/attestation/crafter/collector_aiagentconfig.go:

    • Replace os.CreateTemp + manual write/close with os.WriteFile to a deterministic path in os.TempDir()
    • Filename uses full ConfigHash (no truncation)
    • Inline the path construction at the call site with root-cause comment
    • Remove aiConfigTempFilePath() helper and MkdirTemp wrapper
  • pkg/attestation/crafter/collector_prmetadata.go:

    • Replace createPRMetadataTempFile() helper (which used os.CreateTemp with random suffix) with inline deterministic path using sha256(content)
    • Same pattern: os.WriteFile + defer os.Remove in os.TempDir()
  • Removed test files:

    • collector_aiagentconfig_test.go — tested the removed aiConfigTempFilePath helper
    • collector_prmetadata_test.go — tested the removed createPRMetadataTempFile helper

Test Plan

  • go vet ./pkg/attestation/crafter/... passes
  • go test ./pkg/attestation/crafter/... — all sub-packages pass

Fixes #2907

@waveywaves waveywaves force-pushed the fix/ai-config-deterministic-filename branch 3 times, most recently from 486591a to cb22854 Compare March 28, 2026 04:15
@waveywaves waveywaves marked this pull request as ready for review March 31, 2026 12:57
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="pkg/attestation/crafter/collector_aiagentconfig_test.go">

<violation number="1" location="pkg/attestation/crafter/collector_aiagentconfig_test.go:59">
P3: Test hardcodes a Unix-style path and full-path string equality, but aiConfigTempFilePath uses filepath.Join, which is OS-dependent. This assertion will fail on Windows even when the implementation is correct.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

path := aiConfigTempFilePath(testTempDir, testAgent, testConfigHash)

require.Contains(t, path, testConfigHash, "path must contain the full config hash (not truncated)")
assert.Equal(t, testTempDir+"/ai-agent-config-"+testAgent+"-"+testConfigHash+".json", path)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Test hardcodes a Unix-style path and full-path string equality, but aiConfigTempFilePath uses filepath.Join, which is OS-dependent. This assertion will fail on Windows even when the implementation is correct.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At pkg/attestation/crafter/collector_aiagentconfig_test.go, line 59:

<comment>Test hardcodes a Unix-style path and full-path string equality, but aiConfigTempFilePath uses filepath.Join, which is OS-dependent. This assertion will fail on Windows even when the implementation is correct.</comment>

<file context>
@@ -0,0 +1,61 @@
+		path := aiConfigTempFilePath(testTempDir, testAgent, testConfigHash)
+
+		require.Contains(t, path, testConfigHash, "path must contain the full config hash (not truncated)")
+		assert.Equal(t, testTempDir+"/ai-agent-config-"+testAgent+"-"+testConfigHash+".json", path)
+	})
+}
</file context>
Fix with Cubic

@waveywaves waveywaves closed this Mar 31, 2026
@waveywaves waveywaves force-pushed the fix/ai-config-deterministic-filename branch from cb22854 to 449088d Compare March 31, 2026 13:21
@waveywaves waveywaves reopened this Mar 31, 2026
@waveywaves waveywaves force-pushed the fix/ai-config-deterministic-filename branch from 449088d to c7e366e Compare March 31, 2026 13:25
@waveywaves
Copy link
Copy Markdown
Contributor Author

@jiparis @migmartri gentle ping — this is ready for review. CI is green except for the known flaky TestReferrerIntegration deadlock (unrelated to this PR). Fixes both collector_aiagentconfig.go and collector_prmetadata.go deterministic filenames. Note: PR #2917 already fixed the primary root cause; this provides additional hardening.

@migmartri migmartri requested a review from jiparis April 4, 2026 17:38
Copy link
Copy Markdown
Member

@jiparis jiparis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @waveywaves.
Please remove those PR-metadata changes since they are not related to the main issue.
Regarding the AI config deduplication in the CAS, note that it was already solved at #2917 (filename doesn't affect to CAS uploads), however, I agree that using the digest is better than a random rumber. Just check my comment about the filename length.

return fmt.Errorf("marshaling PR/MR metadata: %w", err)
}

materialName, tmpFile, err := createPRMetadataTempFile(metadata.Number, jsonData)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR metadata uses the PR number intentionally. It's already deduplicated across different PRs, but they share the name within the same PR since basically the latest one is the one that matters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Reverted all PR metadata changes — the PR number naming is intentional. Restored the original file and test.


if _, err := tmpFile.Write(jsonData); err != nil {
tmpFile.Close()
// Use a deterministic filename derived from the config hash so that retries
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is not accurate. Deduplication is done through the digest, which is already addressed #2917

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the misleading dedup claim — updated the comment to just note the constant filename for consistent Artifact.Name.

// produce the same Artifact.Name (via fileStats -> os.Stat().Name()) and
// avoid duplicate CAS uploads. PR #2917 fixed the primary root cause
// (non-deterministic content from captured_at); this is additional hardening.
tmpPath := filepath.Join(os.TempDir(), fmt.Sprintf("ai-agent-config-%s-%s.json", agentName, data.ConfigHash))
Copy link
Copy Markdown
Member

@jiparis jiparis Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that ConfigHash has a length of 64 characters, too long for a filename. I'd rather take the first 6 characters, or better, just use a constant name like ai-agent-config.json. Name conflicts are ok here, assuming that the project/workflow combo is always constant in the attestation (they refer to the same repository).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Switched to a constant filename: ai-agent-config-{agentName}.json in os.TempDir(). No hash in the filename at all — name conflicts within the same attestation are expected.

@waveywaves waveywaves force-pushed the fix/ai-config-deterministic-filename branch from c7e366e to 14ef4c0 Compare April 7, 2026 08:14
@jiparis jiparis self-requested a review April 8, 2026 07:41
@jiparis jiparis self-requested a review April 8, 2026 07:42
Copy link
Copy Markdown
Member

@jiparis jiparis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waveywaves none of the commits are signed. This PR cannot be accepted for that reason.

waveywaves and others added 2 commits April 8, 2026 15:49
Replace os.CreateTemp (which generates a random suffix) with a
deterministic filename derived from the config's content hash.

Root cause: although AddMaterialContractFree uses a deterministic map
key (Materials[m.Name]), the Artifact struct embedded in each material
gets its Name and Digest from fileStats(), which calls
os.Stat(filepath).Name(). A random temp filename therefore produces a
different Artifact.Name on every call — even when the JSON payload is
byte-for-byte identical — causing the CAS to treat each retry as a new,
distinct resource.

Changes:
- Extract aiConfigTempFilePath() helper for deterministic path generation
- Use full ConfigHash (no truncation) to avoid collision risk
- Write into a private os.MkdirTemp directory (symlink-safe) instead of
  the shared os.TempDir()
- Add unit tests proving same-input-same-output and different-input-
  different-output invariants

Fixes: chainloop-dev#2907

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vibhav Bobade <vibhav.bobde@gmail.com>
…AI config filename

- Revert collector_prmetadata.go: PR metadata intentionally uses PR
  number for naming; deduplication within the same PR is expected
- Restore collector_prmetadata_test.go deleted by mistake
- Simplify AI config temp filename to a constant name per agent
  (ai-agent-config-{agentName}.json) instead of appending the full
  64-character ConfigHash

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vibhav Bobade <vibhav.bobde@gmail.com>
@waveywaves waveywaves force-pushed the fix/ai-config-deterministic-filename branch from 14ef4c0 to 5dc7342 Compare April 8, 2026 10:20
@waveywaves
Copy link
Copy Markdown
Contributor Author

All commits are now GPG-signed. Also addressed all the review feedback in the second commit — reverted PR metadata changes and simplified the AI config filename to a constant name. Ready for re-review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI config collector creates duplicate uploads on retry

2 participants