Skip to content
On this page

Receipt Analyzer

Receipt Analyzer is the world's first automatic receipt analyzer aiming to read, extract and analyze data from receipt/invoice pictures.

Pretty simple on paper, we take an image of receipt and we extract data. But we do a little more than that, we understand data, so we will create value from text. This is not just an OCR (Optical Character Recognition) job, but a lot of process and intelligence is involved.

For example we read a product label on the receipt, and thanks to machine learning and a lot of custom crafted NLP algorithms we can infer a precise category, brand, and characteristics of the product.

What we mean by understanding a receipt:

  • Where and when did the transaction happen?
  • What has been traded in the transaction?
    • This is the hardest part of the understanding because we do not only read which products have been traded, but we also automatically categorize those products.
    • We also detect brands, quantity, price per product, and more.
  • What is the amount of the transaction?

Read this article to fully understand what result you can expect when a receipt is analysed.

Our solution implements various cutting edge technologies. We developed machine learning solutions for image analysis and Natural Language Processing and integrated them with traditional programming approaches.

Kweeri also provides several features such as :

These features will be described in the next sections of this article.

Receipt Management Features

You can use Kweeri for many receipts and from multiple receipts' sources, from mobile apps to servers. With an important amount of receipts, it quickly becomes cumbersome to browse and retrieve them without additional organisation. So we have designed some ways to organise and manage your receipts and other features offered by our solution.

Endpoints

Endpoints simply are anchors to retrieve a set of receipts. You can link receipts to an endpoint, and you will be able to get it easily after the analysis.

An endpoint has properties such as :

Feature use case example

You have two sets of receipt pictures from different countries : France 🇫🇷 & Belgium 🇧🇪.

You want to use Kweeri - Receipt Analyzer to analyze and extract data from these pictures.
You can use the Endpoint feature by creating two different Endpoints :

  • An endpoint to analyze French receipt pictures. In order to analyse correctly these receipts, you have to set the dominant language to fr-FR
  • An endpoint to analyze Belgium receipt pictures. In order to analyse correctly these receipts, you have to set the dominant language to *-BE.

After analysis, you can retrieve these data more easily by searching by endpoint with the User Interface or the API.

Campaigns

Campaigns are also anchors to retrieve a set of receipts but they carry a rules validation logic.

We use the same process for receipt analysis but we added a layer of validation that is configured by you via Kweeri - Receipt Analyzer.

The main difference with a simple receipt analysis is that we confront the extracted data with a set of algorithms to detect some products, validate date, etc.

What are validation rules?

A campaign represents a promotion campaign valid during a certain period of time and linked to a set of rules that have to be matched during receipt analysis. Any document that is processed in the context of a campaign will be checked against the set of rules.

A campaign has properties such as :

Feature use case example

You have a set of receipts from France 🇫🇷 and you want to know which receipts from this set are eligible for promotion campaign.

A receipt is eligible for this campaign if the customer bought a Sodebo product from 01/02/2022 to 01/03/2022.

Today, many companies could check this set of receipts manually. But, with the Promotion feature, we can do this job automatically! All you need is to configure a campaign with the properties previously listed in this article and send receipt pictures to our service.

Additional Analysis Features

Fraud detection system

Sometimes in your campaign or endpoint, you can find the same receipts multiple times, or other anomalies like a product list not matching the total of the receipt. So we created a solution to detect those anomalies on the receipts that you send.

What kind of anomalies can be detected?

When sending a receipt, the anomaly detection will look for potential anomalies in the receipt. There can be two types of anomalies :

  • Duplicates
  • Inconsistencies

Duplicates are receipts that were found to be identical or similar to the analyzed receipt. It can be that we detected the exact same image for another receipt, or similarities like the same product list, the same buy hour in the same shop, etc.

Inconsistencies are anomalies that we found in the receipts, like a product list not matching the total.

In addition, the anomaly detection will only look for duplicates :

  • In the document's endpoint if it is linked to one
  • In the document's campaign if it is linked to one, or if it is linked to both an endpoint and a campaign

Anomalies flag list

FlagDescription
ORIGINAL_RECEIPTNo anomaly detected
IDENTICAL_EXISTING_FILEThe same MD5 Hash of the image was found for one or multiple other receipts
BARCODE_ALREADY_EXISTSThe same barcode was found in one or multiple other receipts
HIGH_FILE_SIMILARITYA similar pHash was found for one or multiple other receipts
SAME_SHOP_SAME_MOMENTOne or multiple receipts that have the same shop and the same buyTime were found
EXISTING_PRODUCT_LISTOne or multiple receipts with the same product list where found
HIGH_GLOBAL_CONTENT_SIMILARITYIf both SAME_SHOP_SAME_MOMENT and EXISTING_PRODUCT_LIST are true
SOFTWARE_EDITEDIf the receipt has been edited by popular photo editor softwares (Photofilter, Photoshop,...)

All the flags that were found for this receipt will be added to a list in the anomaly report. Also, you will have a list of the receipts that are considered as duplicates of this receipt. For each receipt found, you will also have a list of the flags that were found.

For example, you have a receipt X for which the following flags were found :

IDENTICAL_EXISTING_FILE, BARCODE_ALREADY_EXISTS, SAME_IMAGE_LOCATION

But you will also have a list of the duplicated receipts found :

Potential duplicated receipt UIDFlags
e67c9571-47a7-42cd-8439-c94bc9d5e478IDENTICAL_EXISTING_FILE
829e2309-9a36-4404-bcf7-b29580cace20BARCODE_ALREADY_EXISTS, IDENTICAL_EXISTING_FILE

How do I get it ?

There are two ways to get the result of the anomaly detection : Via the Kweeri interface or the API.

Quality check system

The quality of a receipt directly impacts the quality of the analysis. With the experience gained over the past few years, the Aboutgoods team developed a system in order to analyse the quality of your pictures. This system helps you to be automatically compliant with our quality requirement (see this article for more information).

Quality Check is a feature that will give you information about the quality of your document before sending it to the full receipt analysis process. Based on the same tools as the receipt analyzer, it's a cost-reducing feature.

Feature use case example

Assume that you have a mobile app aiming to take receipt pictures and send directly to the Receipt Analyzer service. What is happening if the picture is rejected by the Receipt Analyzer service because of a bad image quality? 🧐

You need to wait until the end of the analysis (which can take up to 10 sec) and ask for your user to take another picture.

But sending the picture to the Quality Check feature before sending it to Receipt Analyzer will divide the response duration by 10! 🙌

If the picture isn't exploitable for the analysis, you will have a faster response.

To put it in a nutshell, with this feature, you can improve the user experience of your app.