Select Page

Arabic Table Extractor

Computer Vision and OCR based application that uses Machine Learning algorithms to detect and extract Tabular data from scanned documents and makes it available in digital form, which can be edited or used in any process.

Arabic Table Extractor is a Computer Vision and OCR based application that uses Machine Learning algorithms to detect and extract Tabular data from scanned documents and makes it available in digital form, which can be edited or used in any process. Moreover, Arabic Table Extractor requires no training prior to use.

This application detects a bordered table, then counts the rows and columns, extracts all the cells of the table, arranges them into the original order, and applies OCR to extract text.

After extracting text from each cell, it creates an output file of CSV or Excel format and creates the exact table with the data i.e., then ready to edit or use for further processing.

Technology Stack:

  • Python 3.7
  • OpenCV 3
  • Tesseract OCR
  • Python Flask Framework (for web interface)

Benefits:

  • Convert Tabular Data in Scanned Documents, into Digital Form instantly
  • No Training Required
  • Faster than Manual Table Creation and Data Entry
  • Highly Time Efficient
  • Supports Arabic and English
  • Usable as a part in Process Automation