Traditional search systems that are based on simple keyword search fail to identify meaning inherent in the query and hence yield low precision results. Wissen’s Intelligent Search can discover documents specific to a topic and pin-point most relevant paragraphs for a given query. It’s advanced nlp pipeline extended with statistical analysis, facilitates asking grammatical and ungrammatical questions and also understands period queries.
Robust and extensive text pre-processing to produce deeply enriched version of the input. Unstructured content such as free form text, html documents, images and tables are converted into intermediate form and stored as triples.
Lexical analysis, linguistic and semantic analysis on documents and queries, and paragraph relevance scoring( between query and paragraph)
Leverages hierarchical property of document text to boost relevance score
Modular nlp pipelines
Also have the ability to link unstructured data with both internal and external structured data to drive analysis.
Typical document based workflows require hundreds of man hours for data entry. Wissen’s inhouse Document data extractor can automate manual data entry work flows by accurately locating and extracting data from fixed and variable locations in a pdf. The solution is based on heuristics based approach that can learn from a few examples for different formats. Spatial features and typological information of PDF text are used to model a layout parser.
Training takes a couple of minutes. Minimum training data required
Extensible to documents that come in a standardise fixed format
Extract complex information from both -Header (like order number) – Line items (like price, quantity And Multi line values spanning to other columns in the table
Flexibility -Lets you choose which data points or line items you want to extract
Unique layout parser for each document type
Can be seamlessly integrated with ERP or CRM system
Designed to scale – Batch process or parse them real time . Once trained, our systems can convert thousands of PDF forms to Excel or CSV within a couple of minutes.