Optical Character Translation Using Spectacles (OCTS)

 OPTICAL CHARACTER TRANSLATION USING SPECTACLES (OCTS)

K. Thiruthanigesan and Prof. Roshan Ragel

Postgraduate Institute of Science, University of Peradeniya, Peradeniya, Sri Lanka

 Optical character recognition is reproducing the text as digital format that have been produced by the non-computerized system. Translation is an essential part because people have barriers in languages. Especially when people read articles and books in their native language, they can understand more clearly and can get more ideas about the content. We develop a framework that takes printed images of documents and by using OCR based on tesseract and translates by using google API. The framework is implementing image capturing techniques, optical character recognition and translation done by using the embedded system based on raspberry Pi. Some of the daily used sentences were selected and captured in different development stages, viewpoints, angles and background. The images are then classified through the proposed system. From the experiment carried out on hundred and twenty-three daily used sentences, we have archived 97% of correctly detected sentences from the OCR convention to translate Sinhala and Tamil. In the translation process we have archived correctly translated Tamil 91% and Sinhala 89%. More over in both process optical character translation we have archived about 93% accuracy when we input the image to reproducing and translation.

Introduction

In the human life, languages play a major role. It helps us to share the knowledge with the next generations through various methods such as books, and records. Most of the scientific and other theory related subjects are transmitted to generations through written documents by using a particular language depending on the culture or the country.

Around the world, there are around 6, 500 spoken languages today, however about 2000 of those languages have fewer than 1000 speakers. The most popular language in the world is Mandarine, Chinese. There are 1,213,000,000 people in the world which use that language. Half of the world population use 13 most popular languages like Mandarine, Spanish, English, Hindi, Arabic, Portuguese, Bengali, Russian, Japanese, Punjabi, German, French, and Tamil.

Also, a lot of scientific books are published in Mandarine, but most of the time only Chinese people are aware of the Mandarine language and the people who are not in Asian countries are unable to read those type of written documents, although they are willing to get the information. As a result of this, people are keen to learn the other languages apart from the mother tongue. However, due to certain reasons such as time allocating and learning obstacles they are de-motivated to learn a new language.

Initially our research was conducted for the English language because People have barriers in languages when they do their higher studies or researches, especially most of the resources are nowadays available in the English language. But learners have technical talent and skills because of the English language barriers they are very difficult to archive the target, Therefore; it is very helpful to have one common and dominant language to do further researches and reading activities with the help of Translation Spectacles.

Therefore, translators and online translating systems have become essentials at present. Although these new methods can be used to translate into various languages easily than the old traditional means like books as it is identified that these new methods are also time-consuming.

Through this research, my main intention is to find a solution for the above-mentioned language barrier and to suggest a new clarification by applying some language translation mechanism to the spectacles.

Overall, I am expecting to propose a specific spectacle which translates the other languages into English or mother tongue, and it will pave the way to read any document without language difficulties, and nobody needs to learn a lot of languages.

Further the same mechanisms  we used to implement these systems that can be easily apply to the Mandarine language. In our nation peoples using Sinhala, Tamil and English because of that the research was carried out only these three languages.

Problem statement or Research goals

This project aims to reduce the language barriers and provide solutions to improve the knowledge without language difficulties.

In the past, they used online translation systems or some other specific software’s which cannot be identified as effective translation systems. Also, it has become a difficult task to translate things without using internet facilities now. Currently, we do not have any direct and proper translation method to translate from hard copy to soft copy. Further, traditional and existing translating systems create unnecessary expenses and delays.

This research definitely will give benefits to students, novel readers, researchers, scientist and the public to enhance their abilities via their mother tongue.   

Objective

The objective of this project is to develop an automated system for OCR conversion and a translation mechanism. Computer vision techniques have great significance on the automatic identification of the images of books and journal articles. Those techniques not only can decrease the manual effort but also can improve the speed and precision of the identification and translation, compared to the manual way. This motivated to develop a system for identifying the language, OCR conversion, and translation.

Approach:

Observe the actual information from the source Language as the mechanic learning system

Get the source language characters from the hard copy using webcam or camera or any capable devices.

Optical character recognition OCR Operation

The characters may be in the form of images so that the images should be converted into readable characters.

Translation Using Google API or Other API

Convert the source language into destination language by using some translating techniques

Displaying destination Language form the source language

Display the output in a readable format or visible format

Data Set

In the database should be stored the characters respected to each language in the given font style or format for the character comparison. Projection images from any hard copy should be compared with existing data sets (API), which are stored already in the database. It is for the purpose of finding language based on the source language. Then OCR comparison is expected to help translation of the language.

Top view of OCTS

CONCLUSION

This study proposed to interduce the system for optical character translation.  From the overall work, we have archived with own dataset 97% of correctly detected sentences from the OCR convention to translate Sinhala and Tamil. In the translation process, we have archived correctly translated Tamil 91% and Sinhala 89% more over in the both process optical character translation we have archived about 93% accuracy when we input the image to reproducing and translation using the techniques of optical character translation using spectacles.

The overall testing result is around 93%. The accuracy of the current proposed method is comparable to those reported in contemporary works

Overall work we have archived with ISRI dataset 82% of correctly detected sentences from the OCR convention to translate Sinhala and Tamil. In the translation process we have archived correctly translated Tamil 81% and Sinhala 78% the overall accuracy performance of the ISRI dataset 80%.

The research can be improved using a newest OCR technology and full trained different APIs This framework can be further integrated into digital spectacles with a slight modification. Further it can be developed with the automatic text detection and language identification.  our model can help to the book readers and learners to get the proper knowledge with their mother language.

 

Post a Comment

0 Comments