Receipt data extraction in details
This article aims to explain what you can truly expect from Kweeri. As our solution is completely new on the market, it can be hard to understand exactly what we offer and how it work 🧐.
First of all, only receipts from supermarkets can be analyzed. Drive-through bills and receipts from other kind of shops have a different format that we cannot read yet. In other words, we understand receipts from Carrefour or Aldi, but cannot understand a receipt from McDonald or a perfume shop.
To understand a receipt, we need you to tell us the language as the system will use country specific data to understand some parts of the receipt. This means that a French receipt analysed in Spanish won't show good results. Today, we can analyse receipts from France, Spain, Italy and Belgium.
For a better analysis, you should always send raw pictures of receipts. Our solution is trained on raw receipts (not cropped, nothing handwritten etc) and it allows us to retrieve the metadata of the picture, to know where and when the picture was taken.
Also, if a human cannot read the receipt, our solution won't be able either. Even if it's not human eyes, receipts go through an OCR (Optical Character Recognition) that get the words out of it for them to be understood by Kweeri. If the words are blurry or hidden, they won't be recognized and thus won't be analysed...
Please read our quality requirements to better understand this.
Data understood in receipts
Here is every information we can get out of a receipt:
- Where and when was the picture taken: we get these information thanks to metadata of the picture, so if you modified the picture before sending it to Kweeri, you won't get these information.
- Where and when products have been bought: the full address of the shop, its phone number, its sign/sub-signs and the date and time of the purchase.
- Payment methods: how did the customer pay (cash, credit card, vouchers, all of this at once, etc..).
- Number of items bought: we count the product lines to know how many products were bought.
- Total: Total amount paid by the customer (can be negative !).
- Total max: Total value of the merchandises for this receipt, this can be different from what user paid (total field).
- For each product, we can get:
- the quantity bought (eg. you bought 2 bottles of wine, the quantity will be 2).
- the size of a bundle (eg. you bought 1 bundle of 4 Danette yogurts: the size of the bundle will be 4, and the quantity will be 1).
- the category (see them all here): this is where we understand the product. You bought Danette? We know this is a yogurt. You bought a bottle of Coca-Cola? This is soda! This may sound obvious to you when reading this, but if I tell you "4 TR JB", you may not know this is transformed meat (this French label stands for 4 slices of ham). This is a very powerful tool!
- the brand: we can guess a lot of brands from very abbreviated word. However, if there is no indication of any brand at all, our system cannot just figure it out from nothing. In the previous example with ham, we won't detect any brand as there is literally no indication (TR is the abbreviation for slices and JB for ham). Our system is a receipt analyzer, not a receipt seer 😉
- the price of the product: we return the unit price of a product and the total price spend for this product (in case several units were bought).
- the weight: if there is a weight or volume specified, we return it too. For example, if the person bought 1,13kg of tomatoes, you'll know it!
- other flags like if the product is bio or vegan.
Example of a good processed analysis:
We can also compare some of these information to other receipts you send to help you detect anomalies.