Overview

This model takes documents (W9 form, Commercial Registration Document, Business Incorporation Document) which are in pdf format as input and returns back the Company name, identification number (or its equivalent depending on the form/document provided and the language it is in) and address in the provided document.

Some of the libraries used in this model are llama parser, VectorStoreIndex from Llama Index, OpenAI, Py tesseract and Streamlit.

How to Use

Upload the document from which you want to get the details by clicking the Browse files button. After uploading the file click on Upload button then you will get the response from the model. If you are uploading german/french document, make sure you are on the Germany/France pages.

Know Issues

  • For USA forms, if the provided pdf file is an image, then the model is struggling to extract the employment identification number (EIN) in most of the cases. Also, if the quality of the provided pdf (example: mobile scanned pdf documents) is not good then there is high chance of getting wrong responses.
  • For German documents, if the provided pdf file is an image, then there is a chance that model might miss some german characters in the response.