Kurdish Handwritten Word Recognition withCRNN Using Multi-Backbone Evaluation

Kurdish Handwritten Word Recognition

Authors

DOI:

https://doi.org/10.63841/iue24539

Keywords:

Kurdish OCR, handwritten text recognition, CRNN, Mobile Net, synthetic data, low-resource languages

Abstract

We introduce the Kurdish Handwritten Word Dataset (KHWD), this is the first large-scale collection of handwritten word images created from printed forms by 8,000 native Sorani Kurdish writers. The dataset contains around 400,000-word samples covering 10,000 unique words, captured via high-resolution scanning (1200 DPI) and processed through an automated segmentation pipeline to produce aligned image–transcription pairs (with CSV mappings). This work specifically highlights the challenges of the Sorani script (cursive joins, diacritics, dot-position distinctions, and right-to-left writing) that motivate our dataset design. We employ a Convolutional Recurrent Neural Network (CRNN) for recognition, evaluating multiple CNN backbones (ResNet18, ResNet50, GoogleNet, and MobileNetV2.The best results were achieved by MobileNetV2 pretrained on isolated Kurdish characters (achieving 97% training and 96% validation accuracy) and then fine-tuned on a combination of KHWD and synthetically generated word images, this augmentation yields significantly better performance The improved MobileNetV2 achieves approximately 17.98% CER, 54.31% WER, and 45.69% word-level accuracy, reducing CER by 1.47%, WER by 3.28%, and increasing word-level accuracy by 3.28% compared to the baseline MobileNetV2.

Author Biographies

  • Israa Mahdi Alsaqi, Software Engineering and Informatics Department, College of Engineering, Salahaddin University, Erbil 44001, IRAQ

    Israa Mahdi is a Software Engineer at the Department of Software Engineering and Informatics, College of Engineering, Salahaddin University. She received the B.Sc. degree in Software Engineering from Salahaddin University and she is M.Sc. student in college of engineering Software Department at Salahaddin University. Israa Mahdi Alsaqi is a member of the Kurdistan Engineering Syndicate

  • Polla Fattah, Software Engineering and Informatics Department, College of Engineering, Salahaddin University, Erbil 44001, IRAQ

    Dr. Polla Fattah is a Lecturer at the Department of Software Engineering and Informatics, College of Engineering, Salahaddin University. He received the B.Sc. degree in Software Engineering from Salahaddin University, the M.Sc. degree in Information Technology from the Electrical Engineering Department at Salahaddin University, and the Ph.D. in Data Mining and Machine Learning from the University of Nottingham. His research interests are in deep learning, OCR, classification, and Kurdish language technology. Dr. Fattah is a member of the Kurdistan Engineering Syndicate, has published numerous papers in international journals and conferences, and has participated in many national and international academic events.

References

S. M. Shareef and A. M. Ali, "Deep learning-based digitization of Kurdish text handwritten in the e-government system," Indonesian Journal of Electrical Engineering and Computer Science, vol. 35, no. 3, pp. 1865-1875, 2024, doi: 10.11591/ijeecs.v35.i3.pp1865-1875.

R. M. Ahmed et al., "Kurdish Handwritten character recognition using deep learning techniques," Gene Expression Patterns, vol. 46, p. 119278, 2022/12/01/ 2022, doi: https://doi.org/10.1016/j.gep.2022.119278.

B. Yaseen and H. Hassani, "Making Old Kurdish publications processable by augmenting available optical character recognition engines," arXiv preprint arXiv:2404.06101, 2024.

I. A. Qader and T. A. Rashid, "Kurdish Handwritten Character Recognition System: Review of Methods and Progress," Authorea Preprints, 2024.

P. A. Abdalla et al., "A vast dataset for Kurdish handwritten digits and isolated characters recognition," Data in Brief, vol. 47, p. 109014, 2023. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC10018436/.

H. D. Majeed, G. S. Nariman, R. S. Azeez, and B. B. Abdulqadir, "Kurdish standard EMNIST-like character dataset," Data in Brief, vol. 52, p. 110038, 2024/02/01/ 2024, doi: https://doi.org/10.1016/j.dib.2024.110038.

A. Yadav, S. Singh, M. Siddique, N. Mehta, and A. Kotangale, "OCR using CRNN: A deep learning approach for text recognition," in 2023 4th International Conference for Emerging Technology (INCET), 2023: IEEE, pp. 1-6.

R. M. Ahmed, T. A. Rashid, P. Fatah, A. Alsadoon, and S. Mirjalili, "An extensive dataset of handwritten central Kurdish isolated characters," Data in Brief, vol. 39, p. 107479, 2022.

S. H. Ali and M. B. Abdulrazzaq, "KurdSet Handwritten Digits Recognition Based on Different Convolutional Neural Networks Models," TEM Journal, vol. 13, no. 1, pp. 221-233, 2024. [Online]. Available: https://doi.org/10.18421/TEM131-23.

A. M. Qadir, P. A. Abdalla, M. I. Ghareb, D. F. Abd, and K. M. HamaKarim, "An Ensemble Transfer Learning Model for the Automatic Handwriting Recognition of Kurdish Letters," Iraqi Journal for Electrical and Electronic Engineering, vol. 21, no. 2, pp. 54-63, 2025. [Online]. Available: https://doi.org/10.37917/ijeee.21.2.6.

C. A. Romein, A. Rabus, G. Leifert, and P. B. Ströbel, "Assessing advanced handwritten text recognition engines for digitizing historical documents," International Journal of Digital Humanities, 2025/05/12 2025, doi: 10.1007/s42803-025-00100-0.

B. Imane, A. Alae, K. Ghizlane, and M. Mrabti, "Enhancing Arabic handwritten word recognition: a CNN-BiLSTM-CTC architecture with attention mechanism and adaptive augmentation," Discover Applied Sciences, vol. 7, no. 5, p. 460, 2025/05/06 2025, doi: 10.1007/s42452-025-06952-z.

M. D. A. Cheema, M. D. Shaiq, F. Mirza, A. Kamal, and M. A. Naeem, "Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)," PeerJ Computer Science, vol. 10, p. e1964, 2024.

A. Hamza, S. Ren, and U. Saeed, "ET-Network: A novel efficient transformer deep learning model for automated Urdu handwritten text recognition," PLOS ONE, vol. 19, no. 5, p. e0302590, 2024, doi: 10.1371/journal.pone.0302590.

S. Minz, R. Kanojia, T. Yadav, and N. Jayanthi, "Enhancing Accuracy in Handwritten Text Recognition with Convolutional Recurrent Neural Network and Data Augmentation Techniques," in 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), 2023: IEEE, pp. 803-808.

R. Dey, R. C. Balabantaray, S. Mohanty, D. Singh, M. Karuppiah, and D. Samanta, "Approach for preprocessing in offline optical character recognition (ocr)," in 2022 Interdisciplinary Research in Technology and Management (IRTM), 2022: IEEE, pp. 1-6.

S. H. Ali and M.-B. Abdulrazzaq, "KurdSet: A Kurdish Handwritten Characters Recognition Dataset Using Convolutional Neural Network," Computers, Materials & Continua, vol. 79, no. 1, pp. 429--448, 2024. [Online]. Available: http://www.techscience.com/cmc/v79n1/56307.

J. A. Qadir, S. K. Jameel, W. O. Khudhur, and K. H. Manguri, "CKSD: Comprehensive Kurdish-Sorani Database," IAPGOŚ, vol. 1, pp. 153-156, 31.03.2025 2025. [Online]. Available: http://doi.org/10.35784/iapgos.6521.

H. Majeed and G. S. Nariman, "Offline Kurdish Character Handwritten Recognition (OKCHR) Using CNN With Various Preprocessing Techniques," Research Square, 2023 2023, doi: 10.21203/rs.3.rs-2565015/v1.

T. Mohammed and A. Ahmed, "Offline Writer Recognition for Kurdish Handwritten Text Document Based on Proposed Codebook," UHD Journal of Science and Technology, vol. 5, p. 21, 03/31 2021, doi: 10.21928/uhdjst.v5n2y2021.pp21-27.

Downloads

Published

2025-10-25

Issue

Section

Engineering

How to Cite

Kurdish Handwritten Word Recognition withCRNN Using Multi-Backbone Evaluation: Kurdish Handwritten Word Recognition. (2025). Academic Journal of International University of Erbil, 2(04), 439-449. https://doi.org/10.63841/iue24539