Sunday, February 25, 2018

How to easily store email attachments in SharePoint Online and extract data using Parserr

One of my past time hobbies is renovating my home.  I find it quite relaxing to be able to step away from the screen and engage in an activity that lets me use my mind and hands in a creative way.  This is all good, except when it comes time to submit all my receipts for my annual tax return.  Dealing with all the receipts is a daunting task.

A while back, I created my own personal digital record solution on SharePoint Online, where I’m tracking receipts, invoices, tax documents, and general statements separately (I’ll be discussing the details of this solution in another article shortly).  Each of these documents is defined by a different content type with its own metadata site columns.  For this example, I’m interested in the Total amount paid on the receipt.  As I do most of my shopping at Home Depot and there is the option to have the receipt sent to me electronically, I figured why not automate the addition of the document into my records library.  And so, I did, using Flow and Parserr.  The process is broken down into two separate flows as described below.

Flow 1:  Copy the document into the Records Library

In my case, I’m using Gmail to receive these emails.  The emails that are sent from Home Depot have two common identifiers:

  • The email title is “Your Electronic Receipt”
  • The sender is HomeDepotReceipt@homedepot.com

Using this information, I’m able to set up a trigger that will kick-off the flow whenever I completed a purchase at Home Depot.


Next, I extract the attachment, save it to the records library.  It’s important to capture the ID of the file that was created in SharePoint as that ID will be used a couple of time to update the document.  The first time I use the Id is to set the content type to be of type “Record”.



Once that’s done, I’m sending the same email I got from Home Depot to Parserr.  I’m including the file ID in the first line of the body of the email.


Flow 2:  Extract the Total from the Receipt and update the File’s metadata

Before using Parserr to extract the total amount, I had to set up the service using instructions on Extracting Text from PDF Files.  Once that was done, all I needed to do was to register the Parserr service with Microsoft Flow, setup the “When an email is received” trigger, and then update the file properties with the correct amount.  As you can see in the image below, I’m explicitly converting the document ID to an integer and the amount to a floating number to ensure that they don’t cause errors.

Further Enhancements

This solution can help you cut down time in processing many receipts that are coming from the same source.  As there may be errors in the processing of files (for example, the OCR is picking up the wrong text), it would be good to set up an action that should fire off an email with the link to the document if conversion fails.

Of course, you can set up this type of service for any source of digital receipts and even extract multiple pieces of information in a single flow.