The overwhelmingly cheap cost of storage has meant that we rarely delete things any more.

At work we come up with an idea, research it and present our findings as something that we could do. A few months pass, an evolution on that original idea comes along. We refresh our memory from the original idea, research what’s changed and then present it as a new idea, leaving the old one as it was.

Storage is cheap. The old idea is doing no harm. It’s just a harmless old document that no one would go and read.

Until we come to train LLMs on our company documents. All of a sudden, there’s important context missing. All those old (and good bad!) ideas and out of date reports all of a sudden are presented equally for an LLM to train against.

When we then come to interrogate this knowledge, how will we know what’s useful? How will we be able to trust it?