An Application of Image Processing in Optical Mark Recognition

pdf 8 trang Gia Huy 17/05/2022 2720
Bạn đang xem tài liệu "An Application of Image Processing in Optical Mark Recognition", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfan_application_of_image_processing_in_optical_mark_recogniti.pdf

Nội dung text: An Application of Image Processing in Optical Mark Recognition

  1. ISSN 2588-1299 VJAS 2020; 3(4): 864-871 Vietnam Journal of Agricultural Sciences An Application of Image Processing in Optical Mark Recognition Tran Vu Ha1 & Nguyen Thi Thu2 1Faculty of Information Technology, Vietnam National University of Agriculture, Hanoi 131000, Vietnam 2Faculty of Animal Science, Vietnam National University of Agriculture, Hanoi 131000, Vietnam Abstract The Optical Mark Recognition (OMR) is very popular with universities for the reading of multiple-choice questions. In this article, we presented a software system for processing surveys at the Vietnam National University of Agriculture based on digital image processing. This software was built using MATLAB and easy to use. The surveys were digitized using a scanner and sent to the software tool. In this study, we tested more than 170 surveys of nine different types. The software tool correctly detected all the valid answers. It was also able to detect all questions with no or multiple marks. Keywords Image processing, optical mark recognition, survey Introduction Optical mark recognition (OMR) is a form of automated data processing. Questions with multiple choices are printed on paper. Respondents then mark their answers using pens. In the next step, the sheets are scanned and sent to a computer for processing. There are many applications of OMR including multiple-choice examinations (for students and pupils) and feedback collection (from customers, students, and users, etc.). In universities (i.e., Vietnam National University of Agriculture), collecting feedback from students plays an important role in evaluating and improving the quality of education. Nowadays, many commercial solutions for OMR are available Received: April 17, 2020 Accepted: December 5, 2020 (e.g., OpScan Series Product from SCANTRON). In common, these products require a dedicated scanner and answer sheets, which Correspondence to motivates the finding of cheaper solutions. Hong Duc University ntthu@vnua.edu.vn created a software named TickREC for this purpose (Hong-Duc ORCID University, 2014). The Vietnam Forestry University also has its Le Thanh Ha software solutions (Mai Ha An, 2014). Increasingly more methods 5491 for mark detection have been published. Gaikwad (2015) applied a 864 Vietnam Journal of Agricultural Sciences
  2. Tran Vu Ha & Nguyen Thi Thu (2020) template matching algorithm after finding the (iv) Student feedback about an advanced region of interest to find the answers marked education program (Gaikwad, 2015). Loke et al. (2018)et al. (v) Master student feedback about a specific proposed a method based on pixel counting and course simple thresholding that can be used under a (vi) Graduate student feedback about an variety of conditions . Another method by Belag educational program et al. was developed based on the creation of template answer sheets and key points detection (vii) Student feedback about a theoretical algorithms (Belag et al., 2018). Each of these course of an ordinary education program methods (and corresponding software tools) has (viii) Student feedback about a practical its own advantages and disadvantages. For course of an ordinary education program example, Belag’s tool used a dedicated sheet for (ix) Student feedback about a theoretical answers, this sheet also had checkmarks that course of a Professional Oriented to Higher helped in case the scanned image was rotated. Education (POHE) program This kind of sheet is suitable for tests but is not For each type of questionnaire, there were good for surveys. In cases of TickREC and the more than 30 sheets that were randomly filled. tool of Mai Ha An (2014), they could process the All of the sheets were scanned with an HP sheets that contained both questions and answers scanner (ScanJet Pro 3000 s3). The output file (Mai Ha An, 2014). Because each software format was normally JPEG but could also be works with a certain type of answer sheet, which PNG, BMP, or some other formats supported by was designed as needed by the authors, it is not MATLAB (see method section for more details). possible to apply these softwares instantly for the The width and the height of the images were surveys at the Vietnam National University of 1655 and 2338 pixels, respectively (these Agriculture. dimensions of images could be slightly different Hence, in this work, we created a software depending on the scanner). The examples of for processing surveys at the Vietnam National surveys are shown in Figures 1 and 2. University of Agriculture. The surveys were scanned by an ordinary scanner and sent to the Methods software to process. This software was designed MATLAB - Environment for software in such a manner that it was easy to use and no development special training was required. This system was cost-effective because no dedicated machine or MATLAB (short name for matrix answer sheets were required. laboratory) was developed in the 1970s by Cleve Moler (Haigh, 2008). Most of the codes of MATLAB was written by Cleve Moler using Materials and Methods FORTRAN. Jack Little and Steve Bangert then Materials reprogrammed MATLAB in C. Together with Cleve Moler, three of them founded the In this project, we used nine different types MathWorks in California in 1984. MathWorks of questionnaires. All of these were used by the then develops, maintains, and distributes Center for Quality Assurance, Vietnam National MATLAB as a commercial product (Sandeep, University of Agriculture: 2017). Nowadays, MATLAB supports various (i) Employee feedback about the operation platforms such as LINUX, Windows, and of a number of divisions MacOS. With MATLAB, users write a few lines (ii) Member feedback about the support of of code to acquire instant results without the Ho Chi Minh Communist Youth Union involving a compiler. MATLAB is used for data (iii) Student feedback about the support of a analysis and visualization. It supports multiple number of divisions types of data (audios, images, videos, CSV, and 865
  3. An application of image processing in Optical Mark Recognition (a) A survey for employees (b) A survey for students Figure 1. Example of surveys with one page (a) The first page of a student survey (b) The second page of a student survey Figure 2. Example of surveys with two pages 866 Vietnam Journal of Agricultural Sciences
  4. Tran Vu Ha & Nguyen Thi Thu (2020) different databases). MATLAB also provides To extract the region of interest (ROI), the App Designer tool which allows the users to region in which people filled in the options, we different databases). MATLAB also provides used a special image called a mask. As shown in App Designer tool which allows the users to Figure 4a, a mask contained only filled options. build GUI (Graphical User Interface) for their Our program would then find the ROI. The programs (Educba, 2020). For these reasons, we position and size of ROI (the region inside the red used MATLAB to develop our software tool for rectangle, Figure 4b) was then used to crop the data processing. other scanned images. Processing workflow With the function imfindcircles from Figure 3 shows the basic steps needed for MATLAB, we were able to locate all the options the processing of one scanned page of on the cropped images. The number of black questionnaires. For the first step, the selected pixels in each circle helped us to indicate the machine (ScanJet Pro 3000 s3) scanned multiple selected one. pages in a single run. After that, our software tool Our software tool then outputted the selected then came into play. options for every question on the sheet. The Because our questionnaires were printed in output was eventually stored in a plain text file. monochrome and then filled using black or blue (the colors of most ballpoint pens), converting Results and Discussion images to binary would save us memory and time for processing. With the support from MATLAB, The software tool converting images to binary was straightforward. Figure 5 shows the main graphical user We only needed to call the im2bw function with interface (GUI) of the program. The user first the original image as a parameter, the function needed to specify the directory of scanned then returned a binary image. images by clicking Select image folder button Figure 3. The proposed stages for data processing 867
  5. An application of image processing in Optical Mark Recognition (a) An example of mask image (b) ROI on mask image (the area inside the red rectangle) Figure 4. Mask image Figure 5. The main user interface of the program 868 Vietnam Journal of Agricultural Sciences
  6. Tran Vu Ha & Nguyen Thi Thu (2020) (area 1). All images in the selected directory 179 questionnaires belonging to nine different would be listed in the area below the button (area types. Our tool correctly detected all valid 2). The user then selected the mask file by questions (questions having one option filled). It clicking Select mask button (area 3). Depend on correctly identified all questions that were not the type of questionnaire, we might need to select filled (not evaluated by students, as shown in two masks if the questionnaire contained two Figure 6a). The tool could also detect the pages. To start processing images, the user question that had multiple options filled (the clicked on Start button (area 4). The result would students changed their mind and chose another be displayed at the bottom right of the window option) (Figure 6b). (area 5). Because the number of black pixels in each option was used to identify which options were Processing questionnaires filled, our tool might not work correctly in some Table 1 shows a summary of the analysis of cases as follows: Table 1. Results of data processing Number of Number of Number of Number Total multiple questions in Number of correctly of unfilled Type of questionnaires number of filled the questionnaires detected questions questions questions questionnaires questions detected detected Employee feedback about the 10 35 350 339 11 0 operation of a number of divisions Member feedback about the support of Ho Chi Minh Communist Youth 10 34 340 338 1 1 Union Student feedback about the support 10 35 350 342 5 3 of a number of divisions Student feedback about an advanced 25 35 875 866 2 7 education programs Master student feedback about a 23 35 805 800 3 2 specific course Graduate student feedback about an 43 35 1505 1498 2 5 educational program Student feedback about a theoretical course of an ordinary education 22 35 770 769 0 1 program Student feedback about a practical course of an ordinary education 18 35 630 628 1 1 program Student feedback about a theoretical 18 35 630 629 0 1 course of a POHE program 869
  7. An application of image processing in Optical Mark Recognition Instead of filling in the option, the user used selecting the corresponding image from the list a checkmark (tick) or x a mark (cross) to mark of images. After checking the images, the user the selected option (Figure 6c). The number of was able to make direct modifications in the black pixels inside a checked option might not be result area before exporting the final result to the enough for a valid filled option. output file. Options were not completely filled (Figure If the scanned images were rotated, our tool 6d). Similar to the previous case, the option might encounter a problem due to the scanning might not be bold enough to be a marked one. or copying process. Especially, when the crop The user used light colors to mark the area did not contain all the options, the program selected option. In this case, filled areas might could not obtain enough data for analysis become unfilled because of the conversion from (Figure 6e). In the future update, we will give a color images to binary images. warning for this kind of sheet. One possible Apparently, our tool marked this question as solution to this problem is using checkmarks. NULL in the result area. The user could easily Checkmarks are black-filled rectangles or see this and check the answer sheet manually by squares located at the corners and the margins of (a) No options filled (b) Multiple options filled (c) Checkmarks used (d) Options not completely filled (e) Cropping the wrong area due to image rotation Figure 6. Problems with questionnaires and scanned images 870 Vietnam Journal of Agricultural Sciences
  8. the sheet. By first detecting checkmarks, it is plan to apply is using checkmarks (bold possible to identify whether the sheet is rotated rectangles located at the corners and the margins too much if one or more checkmarks at the of the questionnaires). corners are absent. If all of the checkmarks at four corners are detected, then we can calculate the rotate angle of the sheet. We can eventually Acknowledgments rotate the scanned sheet in the reverse angle We would like to thank the Vietnam before detecting the options. National University of Agriculture for funding this project. Conclusions In this study, we have proposed a solution for References optical mark recognition problems that do not Belag I. A., Gulpete Y. & Elmanti T. M. (2018). An Image require a dedicated machine or answer sheet. Processing Based Optical Mark Recognition with the Instead, we used ordinary scanners and printers Help of Scanner. International Journal of Engineering with A4 paper. We have built a software program Innovation and Research. 7(2): 5. that works with different image formats. It can Educba W. (2020). Matlab Features [Online]. Retrieved from on detects filled options and questions with April 19, 2020. no/multiple filled options. The output of the Gaikwad S. B. (2015). Image Processing Based OMR Sheet program is in plain text and can be easily opened Scanning. International Journal of Advanced Research in various softwares, including Microsoft Excel. in Electronics and Communication Engineering While other tools only work with one-page (IJARECE). 4. questionnaires, our tool can work with surveys Haigh T. (2008). Cleve Moler: Mathematical Software that contain two pages. The first result looks Pioneer and Creator of Matlab. IEEE Annals of the promising, but still has room for improvement. History of Computing. 30(1): 87-91. Most of the questionnaires contain an area for Hong-Duc University. (2014). An introduction to TickREC - an automatic survey processing tool [Online]. Hong- other ideas (and comments) which may contain Duc University. Retrieved from handwriting text. In the next version, it is our vn/4/3030/Gioi-thieu-phan-mem-xu-ly-phieu-dieu- intention that our software tool will utilize the tra-tu-dong-TickREC.html on April 22. latest achievements of artificial intelligence to Loke S. C., Kasmiran K. A. & Haron S. A. (2018). A new solve this problem or at least give users a warning method of mark detection for software-based optical about having handwriting text on questionnaires. mark recognition. PLOS ONE. 13(11): e0206420. We also want to solve the problem with rotated Mai Ha An (2014). Research and applying image images. This can be done by detecting rectangles processing techniques to process the survey questionnaire on training of Vietnam forestry on the questionnaires. The problem now university. Journal of Forestry Science and becomes selecting the right one (the rectangle Technology . 1(1): 6. that has options inside), but there are multiple Sandeep N. (2017). Introduction to MATLAB for and overlapping rectangles on a single sheet. Engineers and Scientists: Solutions for Numerical Another solution for the rotating problem that we Computation and Modeling. Apress. 222 pages. 871