Typical document based workflows require hundreds of man hours for data entry. Wissen’s inhouse Document data extractor can automate manual data entry work flows by accurately locating and extracting data from fixed and variable locations in a pdf. The solution is based on heuristics based approach that can learn from a few examples for different formats. Spatial features and typological information of PDF text are used to model a layout parser.



  • Purchase and sales orders. Invoices and accounts payable
  • Standardise pdf contracts( rental or other standardised legal documents)
  • Fillable PDF forms
  • Can be easily extended to scanned pdfs

Why Document Data Extractor?

Training takes a couple of minutes. Minimum training data required

Extensible to documents that come in a standardise fixed format

Extract complex information from both -Header (like order number) – Line items (like price, quantity And Multi line values spanning to other columns in the table

    Language agnostic

Flexibility -Lets you choose which data points or line items you want to extract

    Unique layout parser for each document type

Can be seamlessly integrated with ERP or CRM system

Designed to scale – Batch process or parse them real time . Once trained, our systems can convert thousands of PDF forms to Excel or CSV within a couple of minutes.


Technology Stack