Architecture review
Send me your system docs, architecture diagrams, and the problem you are solving. You get back a written review: what works, what is risky, what to change first. Async, no meetings.
A short guide to what I offer, how I price it, and what the first few weeks look like. Worth a five-minute read before our intro call.
I work best with engineering and product leaders in European SMEs and scale-ups who have an AI problem that is past the slide stage. You have a prototype that needs to become a product, a pipeline that is misbehaving in production, or a compliance deadline you cannot afford to miss.
I am Bhavesh Jain, a senior data scientist based in Germany. Six years building real machine learning systems. My main role is at Fraunhofer ISI, where I help build the JIH Trend Radar. On the side, I take on a small number of independent commissions each year, implementation only.
Most engagements fall into one of these six shapes. If your problem does not fit, just write, I will tell you honestly whether I am the right person.
Send me your system docs, architecture diagrams, and the problem you are solving. You get back a written review: what works, what is risky, what to change first. Async, no meetings.
Full compliance assessment against the EU AI Act. Risk classification of your AI systems (high-risk, limited-risk, minimal-risk), gap analysis against Act requirements, and a prioritized remediation roadmap. Written deliverable (~30 pages), fully asynchronous.
One short engagement to answer one question: is this AI system worth fixing, or is this new idea worth building? You get an audit or a working prototype, depending on the need. Scope is locked before kickoff — fixed price, no surprises.
Build a working evaluation harness for your RAG system on your real data. Baseline metrics (precision, recall, hallucination rate), failure-mode taxonomy across 100–200 test cases, and a reproducible test suite. Handover includes running code and documentation.
Four to eight weeks of focused build work. I design the system, write the code, set up monitoring, and hand it over working. Two weeks of post-launch support included.
Quarterly outcomes: hire your ML engineer, ship 1–2 production models, stand up an eval harness. Code review, architecture decisions, vendor evaluation. One day per week, three-month minimum.
For anything else, EU AI Act readiness, a second opinion on a vendor pitch, a one-off code review, just ask. I will quote a flat fee or hourly rate before any work starts.
Two weeks. German Mittelstand SaaS, ~80 engineers. Their retrieval-augmented support pipeline was returning correct-sounding but wrong answers in production; their internal team had been chasing the cause for two months.
What I actually did: built an evaluation set on their real support tickets, ran a failure-mode taxonomy across 200 cases, swapped the re-ranker, rebuilt the chunking strategy around their domain. The eval harness is now the standard for their other ML services. Anonymised on request; full reference available on a call.
A short list of things I will turn down or refer elsewhere, so we both save time:
Five steps. No agency layers, no proposal templates.
Thirty minutes. We talk about what you are building, what is stuck, and what success looks like. No pitch, no follow-up unless you ask.
Within 48 hours of the call, you receive a written scope: what I will build, what the deliverable is, and a fixed price or hour estimate. No hidden hours.
We set up a shared workspace (Notion, Linear, or whatever you use) and a Slack channel. I send a short written update every week.
I write the code, set up the evaluation, and deploy. You see a working demo at each milestone, no surprises at the end.
Written documentation, a training session for your team, and two weeks of post-launch fixes. After that, your team owns the system.
On a recent Production sprint, week three, the evaluation flagged that retrieval was randomly picking one of three vector indexes instead of the one we had configured. A code path nobody had touched in months. We caught it in a Tuesday review because we had set up the eval before any retrieval work started, and the metric dropped 12 points overnight.
If we had been building on impressions, that bug would have shipped. We would have spent another month chasing "why does it sometimes work" in production. Instead we shipped on time, with a written note in the handover doc explaining what we caught and how.
That is roughly how I work. Evaluation before code. Demos at every milestone, not just at the end. Written updates so you see decisions as they happen. And if my first plan does not survive contact with your data, you hear it from me before a deadline slips.
Discovery sprints have fixed scope; we agree on the question we are answering and that does not move. For Production sprints, small course corrections are baked into the buffer. Anything larger gets a written change-order with a fixed delta price. No surprise invoices.
Sometimes. If your problem is urgent and small, I can squeeze a Discovery sprint into a current week. Larger sprints are scheduled around active commitments. Ask, and I will tell you honestly.
Yes. A standard mutual NDA goes out before any work starts. For your master services agreement, I usually start from your template and flag anything that does not make sense for a one-person operation.
This is a real risk and we talk about it on the intro call. If your engineers do not have the bandwidth to ship alongside me, the Production sprint is the wrong shape. Better path: start with a Discovery sprint to identify what your team actually needs, then a smaller scoped build.
Sometimes that is the right answer, and you still get the full deliverable: the evaluation set, the written rationale, and a recommendation on what to do instead. I would rather save you the next €18k than push a build that should not ship.
The fastest way to know if we are a fit is a thirty-minute intro call.