Smart Inbox is live. 90% of back-office tasks automated. Early bird offer.

👉 See the offer

Why building a B2B order scanner is much harder than it looks

09 June 2026
7
minutes
Antoine Lesqueren
CPO

Everyone thinks an order scanner is simple to build. Until they try. A document comes in, products are identified, an order is created: on paper, it looks like copy-paste with a bit of OCR on top. The reality is fundamentally different. Building a prototype that seems to work is easy; building a production-ready order scanner is far less so. Here's why.

And the cruelest part? Building something that looks like it works is really easy. That is precisely why so many teams, product managers, and ERP vendors underestimate the true scope of the problem.

The prototype trap: the easy 20%

Let us be honest. A working order-scanner prototype is not hard to build. Give a decent development team three to six months and it will have something that parses native PDFs, extracts references and quantities, matches them against a catalogue, and pushes structured data into an ERP. It will be impressive in a demo. Stakeholders will be convinced. The project will be declared a success.

The problem is that what has been built at this stage covers the easy 20%: orders that are already clean, complete, and well formatted. These are the orders where:

  • The document is a native PDF or a clean Excel file
  • The supplier references are stated explicitly
  • Quantities and units are unambiguous
  • The client name matches the database exactly
  • The layout is consistent, page after page

This is a perfectly real category of orders. For this share of incoming volume, a basic scanner works beautifully: exact references, standard units, a single page, processed in seconds. That is why the prototype looks so convincing. The team built it on clean data, tested it on clean data, and demoed it on clean data. Of course it works.

But in production, clean data is the exception, not the rule.

The real world: messy, inconsistent, unpredictable

Real client documents are another matter entirely. The references in incoming orders are client references, not supplier references. A client calls a product “REF-0042-BIS” internally; the catalogue knows it as “PRD-9871-A”. No exact match is possible without a mapping table, and that table does not exist yet, because every client has built its own nomenclature over the years.

Units do not line up either. A client orders in kilos; the ERP only accepts units. The conversion factor is not universal: it depends on the product, the packaging format, the negotiated contract, and sometimes on a business rule that exists only in the head of a salesperson who has been around for fifteen years.

Languages vary. Formats change without warning. Descriptions are partial, abbreviated, or simply wrong. “Belt 457J” means something very specific to someone who knows the range. To a general-purpose text extractor, it is noise. And every client has its own conventions, built up over years, that no static rules engine can anticipate.

The advanced layer: where 50 to 60% of orders really live

As soon as a team tries to go beyond the easy 20%, it hits a qualitatively different class of problems. It is no longer about adding parsing rules, but about building intelligence. This is exactly the territory of AI-powered B2B order automation.

  • Client reference management: maintaining code mapping tables per client, kept up to date as products evolve.
  • Synonyms and abbreviations: understanding that “U”, “UN”, “Unit”, “Piece”, “Pc”, “EA” and “Box” may mean the same thing, or not, depending on context.
  • Continuous learning: when an operator corrects a match, that correction must be remembered and applied automatically next time.

Unit conversion looks deceptively simple. “Convert kilos to units”: in reality, B2B order flows are full of commercial complexity. A client may order in kilos, boxes, packs, layers, pallets, sub-units, or in commercial units that do not even exist in the ERP. Packaging conversion rules differ by client, product, sales channel, packaging format, and negotiated contract. Some conversions are not mathematically deterministic and need business context to be interpreted correctly.

This is no longer OCR. It is operational business intelligence. In concrete terms, 50 to 60% of incoming orders require this level of sophistication. And even the most advanced order scanners on the market today handle at best 70 to 80% of incoming volume. Anyone who claims otherwise has not processed enough volume yet, or is not being honest about their numbers.

The expert layer: the hardest 15 to 20%

At the top of the complexity pyramid sits a category of orders that demands capabilities well beyond document parsing. These are the orders that take the longest to process manually, the most valuable ones, and the ones that create the most friction when handled badly.

  • Blanket orders: deliveries spread over several months, with quantities varying by line and by date.
  • Multi-order documents: several distinct orders bundled into a single file.
  • Hybrid documents: structured tables mixed with free text, handwritten annotations, and attached specifications.

Matching products with no reference or label, from a description alone like “6 plain spherical bearings Elges GE40”, requires semantic understanding, vector search, and domain knowledge, not string matching. Straight-through processing, where an order is ingested, matched, validated, and pushed into the ERP with no human intervention, is only achievable once the system has accumulated enough contextual knowledge about that client, its habits, its catalogue, and its edge cases to act with confidence.

These 15 to 20% are not a niche edge case. They include some of the company highest-value transactions. Not handling them does not make them disappear: it simply shifts the cost onto human operators, introduces delays, and creates systematic friction with the most demanding clients.

The distance between a demo and a production order scanner

The gap between a convincing prototype and a genuinely production-ready order scanner is not measured in weeks. It is measured in layers of intelligence.

  • Core features (~20% of orders): clean documents, exact references, standard formats.
  • Advanced features (~50 to 60% of orders): client references, unit conversion, learning, fuzzy matching.
  • Expert features (~15 to 20% of orders): semantic matching, blanket orders, straight-through processing.

Each layer requires fundamentally different engineering. And each layer is harder than the last, not incrementally, but categorically. The prototype trap is real: ERP vendors add order scanning to their roadmap, integrators present it as a quick win, and it looks convincing in a demo, because the demo always uses clean data. The remaining 80% sits just below the surface, invisible until go-live, and suddenly everywhere.

What this means in practice

Building for the full range of real orders is not a luxury: it is the difference between an order scanner and a truly industrialised system, one that can free your teams from manual data entry.

A useful test to put to any vendor: “Could an untrained intern re-key 100% of client orders?” If yes, a basic scanner is probably enough. But in reality, 99% of suppliers will say no, because they know it is simply impossible: every order requires attention, knowledge, and decision-making.

That knowledge and decision-making capability is exactly what an industrialised order scanner has to encode. Not in a rulebook, because rules cannot anticipate what clients will do next week, but in a system that can learn, reason, and improve continuously. That is the real engineering challenge, and one of the most complex automation problems in B2B distribution.

This article is based on internal research and production experience across multiple sectors and supplier types.

Still have questions?

Discover how Volta can benefit your business. Book a free demo now.

Book a demo
Sales Director, DistriMax
Antoine Lesqueren
CPO

Subscribe to our newsletter

Receive practical tips, case studies, and AI-driven strategies every month to streamline your sales operations.

Subscribe
🎉 Merci, votre inscription a bien été prise en compte.
Error message

RELATED Articles

Insights to optimize B2B sales and order management.

01
01