Your AI project won't fail because of the AI, it'll fail because of your data

When an AI project disappoints, suspicion falls on the model first. "Maybe we need a better one", "maybe the technology is not there yet". In the vast majority of cases we see, that is not it. The model is good enough. What is missing is usable data. That is the uncomfortable truth that is in no sales deck: the exciting AI is the smaller part of the work, the bigger part is the cleanup beforehand.

Garbage in, garbage out, just more eloquent

The old principle of data processing applies to AI especially harshly: the output is only as good as the input. The difference from before is that modern models make the garbage sound good. A system working on outdated documents gives you the outdated information back in perfectly phrased prose, and that is more dangerous than an obvious error, because it looks trustworthy.

The typical data problems are rarely exotic:

Badly scanned PDFs from which the text cannot be read cleanly.
Outdated wikis and manuals where the 2021 rule sits next to the 2026 one, with no hint which applies.
Knowledge in people's heads that is documented nowhere and that even the best AI cannot guess.
Contradictions between sources that a human resolves in context but a system does not.
Missing or chaotic permissions, so the system sees either too little or too much.

None of these problems is solved by a better model. They all resolve only at the source.

Why this gets overlooked

Data quality is unsexy. No one gets applause in the boardroom for "we tidied up our manuals". AI, by contrast, sells well: it is new, visible, impressive. So the budget flows into the model and the demo, and the tedious part, cleaning up the data foundation, gets skipped or underestimated.

There is also a perception error: in the demo everything works, because clean examples were deliberately chosen. Only in production does the system meet the real, messy data stock, and then quality drops. That is then wrongly blamed on the AI.

What actually helps

The good news: you do not have to clean up everything at once. A pragmatic approach means:

1. Look honestly at the data foundation before building. A sober inventory: which sources exist, how current, how clean, who maintains them? This half a week of work often saves months of frustration.

2. Start small, with the best data space. Instead of "the AI should know all our knowledge", a bounded, well-maintained area that really holds up. Success in a clean slice is worth more than mediocrity across everything.

3. Make data maintenance part of operations. Data ages. Whoever cleans up once and then never again has the same problem a year later. Who maintains the sources, at what cadence? That belongs in the plan.

4. Let the system honestly say "I don't know". A well-built system invents nothing on gaps, it makes the gap visible. That is not only safer, it also shows you exactly where the data is missing.

The surprising upside

Here is an effect many underestimate: an AI project forces a company to organise its knowledge. Suddenly you notice that three departments have three versions of the same rule. That the most important procedure is documented nowhere. That no one knows which manual is current. These insights are valuable, quite independent of the AI. Often the cleanup is the real benefit, and the AI is the occasion to finally do it.

Our take

That is why we do not start projects with "which model?" but with "what does your data foundation look like?". It is less glamorous, but it is the lever that ultimately decides between success and failure. A mediocre model on good data beats a top model on bad data, every time.

If you are planning an AI project or one is stalling, let us look at the data first. Often the solution is there, not in the next model.