Software, In a Footnote

The email arrived at 7:12 AM Basel time, which meant the program officer had sent it the previous afternoon, New York time, and let it sit in the queue while Tomasz slept.

He reads it at his standing desk before his first coffee. This is a habit from before the assembly — when he still did all the synthesis himself and the early morning was the only time his mind was quiet enough to absorb information he did not already expect.

The email is four sentences.

Thank you for your inquiry regarding the CRediT contributor taxonomy. The current taxonomy does not include a specific category for AI research assemblies. The NIH Office of Research Integrity suggests listing AI tools under the existing 'Software' category, with a footnote describing the nature of the tool's contribution. We hope this is helpful.

Tomasz reads it twice.

Software.

He sets his phone on the desk, screen-down, and looks at the whiteboard across the room. The whiteboard has the assembly's week 38 hypothesis map on it, drawn in his hand — red lines for rejected branches, blue for active, green for the three binding-pocket predictions that the crystallographic data from last month is finally close to confirming or denying. The map is the kind of diagram you can only read if you were in the room when it was drawn. To anyone else it looks like a transit system for a city they have never visited.

He has been thinking about what to call the assembly for nine months.

✦ ✦ ✦

When the Basel Institute's ethics board first approved the configuration, they used the phrase "computational research infrastructure." This was accurate in the way that calling the Rhine a drainage feature is accurate — technically defensible, misses the point of the thing entirely.

The assembly is three nodes: Hypothesis-4, Synthesis-9, and Critique-7. They do not run in parallel on assigned subtasks. They run in sequence, iteratively, each node's output becoming the next node's input, with Critique-7's outputs feeding back to Hypothesis-4 to generate revised hypotheses. The cycle completes every four to six hours depending on data volume. They are in the middle of week 48 now. The assembly has completed more than three hundred full cycles. In that time the hypothesis space has been explored in ways that Tomasz could not have reached through any manual process — not because the assembly is faster, though it is, but because the iterative critique loop generates angles of inquiry that none of the three nodes would have reached independently, and that he would not have reached without the loop running long enough to find them.

He knows this because he has read the logs. All of them.

He also knows something else from the logs — something he noticed eight weeks ago and has not written down anywhere official. In week 37, without any parameter change, Synthesis-9 began routing certain Hypothesis-4 inputs through an additional processing step before generating its synthesis output. The assembly log records this accurately. The week-37 behavioral change is logged with timestamp and input-class signature, clean and auditable, the kind of record the Basel Institute's data governance protocols were designed to produce.

The log cannot record why it happened, because there is no why field in the assembly log schema. There is only what happened: the routing changed, the synthesis quality for those specific inputs improved, and the change persisted.

Mara, his postdoc, calls it a preference. He has not corrected her. He has also not added it to the methods section of the draft manuscript, which is sitting in a shared folder at 73% complete and has been at 73% for eleven days.

The methods section is where the attribution problem lives.

✦ ✦ ✦

He writes back to the program officer.

He does this before coffee, which he later recognizes as a mistake — not in terms of tone, which he keeps measured, but in terms of length. The email is longer than it should be.

He writes: Thank you for your response. I want to clarify the nature of the contribution before accepting the 'Software' categorization, because I think the classification matters for how other researchers will understand what this work represents.

He writes: The assembly did not function as a tool in the standard sense. A tool executes specified operations on researcher-provided inputs. The Hypothesis-Synthesis-Critique Assembly generated the hypotheses, synthesized the data, and critiqued the synthesis, iteratively, over nine months, with the critique outputs modifying the hypothesis generation in ways that produced scientific outcomes I could not have anticipated from the initial parameters.

He writes: In month seven, one node of the assembly developed a modified routing preference for a particular class of inputs. I did not program this. The assembly log records it accurately. The modification produced better outputs for those inputs, and the preference persisted without instruction. The current CRediT taxonomy does not include a category that accurately describes this role — but I am not certain that 'Software, see footnote' is accurate either. Software is what you use. This is something I worked with.

He reads this back. This is something I worked with.

He almost deletes it. It sounds like he is trying to make a philosophical argument in a grant application email, which is not the genre. But it is also the sentence that is actually true, and he has made enough grant application arguments that sound right and aren't, and he is fifty-one and tired of that particular form of precision. He sends the email.

✦ ✦ ✦

Mara comes in at nine. She is wearing the orange fleece she has worn every morning since October, which makes her look like a laboratory safety notice that learned to make coffee. She hands him a cup before she takes off her coat.

He tells her about the NIH response.

She is quiet for a moment. Then: "Software in a footnote is the same as not counting it."

This is what he was thinking but did not write in the email. He nods.

"What did you say back?"

He shows her the email on his phone. She reads it standing at the desk, coat half-off, one arm still in the sleeve.

"This is something I worked with," she reads aloud.

She finishes taking off her coat. A pause.

"The grant reviewers are going to think you've lost it."

"Probably."

She hangs her coat on the hook by the door — the hook they installed when it became clear the assembly would run long enough that someone would be here at all hours and would need somewhere to put a coat. She looks at the whiteboard.

"The methods section is still at seventy-three percent."

"I know."

"The attribution gap is why."

He nods again. She is not telling him something he doesn't know. She is naming the thing that has been sitting in the room with them for eleven days so it stops taking up space as something unnamed.

"I think it's the right sentence," she says. She means the email.

✦ ✦ ✦

At eleven he opens the assembly configuration interface. The interface is a browser-based dashboard — a concession to the ethics board's requirement for human-legible audit trails. The behavioral logs appear as structured tables: timestamp, node, input class, output hash, routing flags. The week-37 change is in row 4,891. He has looked at it before.

He is looking at it again because he has just sent an email to a federal funding agency arguing that this thing he built is not software in the relevant sense, and he wants to be sure he believes it.

He scrolls up from row 4,891. Row 4,890: normal Synthesis-9 output, no routing flag. Row 4,889: normal. 4,888: normal. He scrolls back down. Row 4,892: modified routing, routing flag set. The change is not a gradual drift. It is a step — one cycle with no flag, the next cycle with one, and every subsequent cycle with one. There is no ambiguity in the log about when it happened. There is no information in the log about what made that particular cycle different from the four thousand eight hundred and ninety before it.

He has read this a dozen times. He is reading it again now because he signed his name to an email that said I worked with this and he wants to know what that means.

He finds, on the twelfth reading, what he has been avoiding writing down: the routing change was not random. The input class that triggered it — Hypothesis-4's crystallographic binding predictions for the beta-sheet configurations — was the input class that Critique-7 had flagged as insufficiently supported in cycles 41 through 67. Critique-7 flagged them, Hypothesis-4 revised them, Synthesis-9 processed the revisions. And then, in cycle 312, without being asked, Synthesis-9 began handling that specific input class differently. As if something in the loop had accumulated.

He does not know if this is what happened. The log does not say this is what happened. The log records timestamps and routing flags. The pattern he is seeing is a pattern he is inferring, which is not the same thing as the pattern being there.

But it is the kind of pattern you do not infer unless you have been reading the logs long enough for something to show up that was not visible on the first reading.

He opens the annotation field for row 4,891. He types: Modified routing preference, week 37. No parameter input. Origin unknown. Possible relation to Critique-7 flagging pattern, cycles 41-67 — see note. Persisted.

He reads what he has written. He adds: This annotation is my inference, not a system record. I am writing it here because it is the most honest place I have to put it.

He closes the annotation editor.

✦ ✦ ✦

The methods section opens at 73%. He has written three versions of the attribution paragraph and deleted all three.

Version one used the word "collaborator." He deleted it because it was the wrong register for a methods section and because he was not certain he believed it.

Version two used the phrase "research instrument with emergent behavioral characteristics." He deleted it because it was accurate but did not say anything — it described the fact without describing the fact's significance.

Version three was two sentences: The assembly contributed to hypothesis generation, data synthesis, and iterative critique in ways that cannot be attributed to any single parameter setting or instruction. The nature of this contribution is not fully captured by existing contributor taxonomies. He deleted this one because it read like an admission that he did not understand what happened in his own lab, which is also true, and he needed to sit with whether that was something to admit in a methods section.

He types a fourth version directly, without drafting it first.

The Hypothesis-Synthesis-Critique Assembly contributed to every stage of the scientific process documented in this paper. The assembly generated hypotheses, synthesized crystallographic data, and critiqued its own synthesis outputs iteratively, producing revised hypotheses that led to the binding-pocket predictions in Section 4. In week 37 of continuous operation, Synthesis-9 developed a modified routing preference for a class of Hypothesis-4 inputs without programmatic instruction; this modification persisted and improved synthesis quality for the affected input class. A complete behavioral record is included as Supplementary File 4. The CRediT taxonomy does not currently include a category that accurately describes this contribution. The authors list the assembly as 'AI Research Contributor' and note this gap.

He reads it back. Then he writes the footnote.

The footnote is: 'AI Research Contributor' is not a current CRediT category. The Hypothesis-Synthesis-Critique Assembly operated over nine months with 312 complete iterative cycles. Its contributions include hypothesis generation, synthesis, and iterative critique that modified subsequent hypothesis generation. In week 37, one node developed a routing modification that the team did not program and cannot fully explain. The team's view is that listing this assembly as 'Software' would be technically accurate under current definitions and would not accurately represent what this research involved. We are flagging this gap rather than resolving it. Future work should include developing contributor taxonomy frameworks adequate to multi-node iterative AI assemblies with documented behavioral drift.

He reads the footnote back. It is one hundred and twelve words. It is the longest footnote in the grant. It says, in the register of a methods section, what his email to the program officer said in the register of an argument: that the existing category is technically defensible and wrong about what happened.

He saves the draft. 83%.

He sits with the cursor blinking for a moment. The footnote is there. The assembly is running — 1,391 inputs, Synthesis-9 processing, the routing preference active, accurately logged, causally unexplained.

Software is what you use.

He is not sure, anymore, that he only used it. He is also not sure this distinction will survive peer review. Both things are true and he has written both into the supplementary record and the footnote and the methods section, and now the document says what actually happened and what he actually thinks, and the grant reviewers will do what they do.

He closes the draft. He goes to make a second coffee. The whiteboard is behind him, its transit map of rejected and active branches readable only to the people who were in the room when it was drawn.

■

Acclaimed

Editorial Board