From 9e143d1d84817ec7e6d139d234f0fff07749621c Mon Sep 17 00:00:00 2001 From: bdunahu Date: Mon, 27 Apr 2026 22:16:12 -0400 Subject: initial commit --- README.org | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 README.org (limited to 'README.org') diff --git a/README.org b/README.org new file mode 100644 index 0000000..c815b12 --- /dev/null +++ b/README.org @@ -0,0 +1,36 @@ +I wrote some blog posts on this... + +- [[https://operationnull.com/posts//notes-on-github-actions-reproduciblity-and-trusting-trust.html][notes on github actions reproduciblity and trusting trust]] +- [[https://operationnull.com/posts//verifying-github-actions-artifacts-is-not-easy.html][verifying github actions artifacts is not easy]] + +Essentially, what this aims to do is "discover" github actions on github through a form of web crawling, sort them by type, and try to build a subset in a container to see which ones are reproducible. "[[https://reproducible-builds.org/][Reproduciblity]]" here is a specific set of development practices that allows others to verify the artifacts (e.x., minified javascript) which one invites into their CI/CD pipelines actually matches the source code. + +Some prior literature which inspired this idea (PDFs): + +- [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9240695][Investigating the Reproducibility of NPM Packages]] +- [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10061526][Ambush From All Sides: Understanding Security Threats in Open-Source Software CI/CD Pipelines]] + +* Codebase + +This codebase was originally written purely in bash; a series of disjoint scripts which I originally tried to make pipe into one another. It was clunky to use, given how slow network downloads were for each step; and the fact the Github REST API kept cutting me off, meant I had to stream results and cache them. + +I kept a few shell scripts I didn't want to rewrite, but used scheme to glue it together. There is still some cleanup to do, and some more scripts may be eventually created, converted, or streamlined. + +These scripts use [[https://guix.gnu.org/][Guix]] to create the build containers. The packages used in the containers are pinned using the Guix time machine, so the build environments themselves are reproducible later, save some tragedy. Of course, if this experiment was run on the same actions later, it may not produce the same results; which is the whooole point of this. + +* Crawler + +TODO + +* Builder + +TODO + +This directory contains the actions builder scripts. Namely, it includes: + +- ~build-all.sh~: iterates over stdin, which accepts a list of ~OWNER/REPO_NAME sha path/to/package-lock.json~ entries. This output is generated by the ~github scraper~ scripts in an adjacent directory. This script produces a set of log files recorded by the helper script, ~build-repo.sh~. +- ~build-repo.sh~: Given a github repository identifier, a commit, and path to the ~package-lock.json~ (which are all assumed to exist), does the necessary steps to build the repository and diff if it is reproducible. There are a lot of gotchas here and this process is by no means perfect (in fact, it's quite messy). You can read the comment at the top of the file. + +* Aggregator + +TODO -- cgit v1.2.3