Ever spend hours OCRing and extracting data from years of invoices? Microsoft’s Azure Form Recognizer can be a game changer for peers tasked to reconstruct charges, convert bills to Ledes CSV or support cost overrun scenarios. The traditional Adobe OCR approach usually requires extensive transformations and clean up to get into Excel, Access or an enterprise DB. Microsoft Azure Form Recognizer applies advanced machine learning (AI models) to extract text, key-value pairs, tables, and structures from documents. In my recent manual testing with the recognizer studio it outperformed my desktop Adobe and Windows OCR tools.
Analysis of client invoices from recent discovery, breach response or sensitive investigations allows me to identify process and cost improvements. Large enough collections required me to sub out the transformation work or discount my rates. While the Azure Form Recognizer is not perfect, it dramatically reduced that tedious formatting and QC clean up. It is meant to run as an Azure resource service that is called by Power Automate or other apps. That means you can automate processing of invoices dropped in a directory. My experiments with roughly a thousand pages of invoices cost $50 in Azure fees.
Here is my quick guide:
- Do you have or can request an Azure account? If so, you will need to create or select an Azure subscription to run the Recognizer. BEFORE you process confidential/client PDFs determine how to appropriately segregate and secure them in the Azure environment. The simplest way may be a completely isolated Azure tenant.
- You can create a free Azure account here.
- The Form Recognizer Quick Start walks you through creating a free subscription ($200 limit) and a resource to associate the recognizer with. Alternatively, see if your IT team can set you up with a subscription, access role and a limited budget to test with.
- Open Azure, create resource (search for Recognizer in the resource type screen). You will need to create/associate with a Resource Group and give it a unique name. This is where you may want tech support to ensure network/access security.
- You can now play with the Form Recognizer Studio to test prebuilt models. The free Studio is limited to 3 pages per document and 500 pages per month.
- The prebuilt invoices and contracts models have obvious usage scenarios.
- The Studio WILL allow you to manually process large documents if you attached it to a Pay-as-you-go billing subscription.
This gets you through testing or manual processing. To automate processing via Power Automate Flow, app or scripts you can explore some of the guides below. You will need to create/request a Storage account for extraction location and a Container. Your account or a service account will need to be assigned Blob Reader role.
- To create a Power Automate flow to process invoice extractions and save in Excel.
- Alternative Python method with Form Recognizer resource:
- Actual code here.
- You can also play with the new AI Builder in Power Automate using the Recognizer resource endpoint and key that you created.
Not that long ago custom apps, document processing and analytics were something that you paid a specialist provider or consultant for. Microsoft is increasingly bringing these capabilities within the reach of the mildly technical ‘citizen developer’. I shared this relatively technical deep dive in the hopes that peers who cling to their Excel macros will join me in exploring these kinds of pseudo-development solutions.
Greg Buckles wants your feedback, questions or project inquiries at Greg@eDJGroupInc.com. Contact him directly for a free 15 minute ‘Good Karma’ call if he has availability. He solves problems and creates eDiscovery solutions for enterprise and law firm clients.
Greg’s blog perspectives are personal opinions and should not be interpreted as a professional judgment or advice. Greg is no longer a journalist and all perspectives are based on best public information. Blog content is neither approved nor reviewed by any providers prior to being published. Do you want to share your own perspective? Greg is looking for practical, professional informative perspectives free of marketing fluff, hidden agendas or personal/product bias. Outside blogs will clearly indicate the author, company and any relevant affiliations.
Greg’s latest nature, art and diving photographs on Instagram.