6th International Young Scientist Congress (IYSC-2020) will be Postponed to 8th and 9th May 2021 Due to COVID-19. 10th International Science Congress (ISC-2020).  International E-publication: Publish Projects, Dissertation, Theses, Books, Souvenir, Conference Proceeding with ISBN.  International E-Bulletin: Information/News regarding: Academics and Research

Part-of-Speech tagging of Yoruba Standard, Language of Niger-Congo family

Author Affiliations

  • 1Laboratory of Electro-technical, Telecommunications and Applied Computing of the Polytechnic School of Abomey-Calavi (EPAC), BENIN
  • 2Laboratory of Electro-technical, Telecommunications and Applied Computing of the Polytechnic School of Abomey-Calavi (EPAC), BENIN
  • 3Faculty of Letters, Arts and Human Science, University of Abomey-Calavi (UAC), BENIN

Res. J. Computer & IT Sci., Volume 1, Issue (1), Pages 2-5, February,20 (2013)


The utilization of corpora is a critical phase of systems of Natural Language Processing (NLP) based on statistical methods.This point is crucial for less equipped and less computerized languages like African languages. This paper aimed to design a yoruba corpus. Yoruba is an African language of Niger-Congo family. It is spoken by more than thirty million people around the world and particularly in Nigeria and Benin. The main motivation of this work was to obtain training data for PoS taggers and to provide applications of Yoruba Language Processing (YLP) with a basic tool. The tagging was performed with SVMTool one of the Part-of-Speech taggers widely used. The preprocessing of the text general outline has been ensured by Perl scripts. The corpus with 312,562 words, formed from the Web, was annotated with an accuracy of 98.04%. This annotated corpus might be used in translation system.


  1. Gamback B., Olsson F., Argaw A.A. and Asker L., Methods for Amharic part-of-speech tagging, AfLaT, Athens, Greece 104-111 (2009)
  2. Dione C.M.B., Kuhn J. and Zarrie S., Design and development of part-of-speech-tagging resources for wolof (Niger-Congo, spoken in Senegal), LREC'l0, 1-8 (2010)
  3. De Pauw G., De Schryver G.M. and van De Loo J., Resource-Light Bantu Part-of-speech Tagging, SALTMILSI AfLaT, 85-92 (2012)
  4. Adegbola T., Owolabi K. and Odejobi T., Localising for Yoruba: Experience, challenges and future direction, Proc. of HLT, Alexandrie, Egypte, 7-10 (2011)
  5. Odejobi O., Design of a text markup system for Yoruba text to speech synthesis applications, Proc. of HLT for development, Alexandrie, Egypte, 74-80 (2011)
  6. Smith P. and Onayemi A., Yoruba Dictionary, Ed. Bis Bus International, http://www.yorubadictionary.com/, (2003)
  7. Awoyale Y., Global Yoruba Lexical Database, LDC, 1-49 (2008)
  8. Aladesote I., Olaseni O.E., Adetunmbi A.O. and Akinbohun F., A Computational Model Of Yoruba Morphology Lexical Analyzer, Proc. of IJCL, 2(1), 37-47 (2011)
  9. Igue A. M., Grammaire Yoruba de base abrégée, CASAS, 1-49 (2009)
  10. Adeniyi H., Yusuff A., Adesanya A., Olomu O., Igue A. M., Fadoro O., Fakeye F. and Bada M., Une orthographe standard et unifiée pour le Yoruba (Nigéria, République du Bénin et Togo), CASAS and CBAAC, 1-20 (2011)
  11. Gimenez J. and Marquez L., SVM Tool: Technical Manual v1.3, TALP Research Center, LSI Department, Barcelone, 1-50 (2006)
  12. Conuejols A. and Miclet L., Apprentissage artificiel : Concepts et algorithmes, Ed. Eyrolles, 2(1), 279-310 (2003)
  13. , Upplc, A Dictionary of The Yoruba Language, Ed. University Press PLC IBADAN, 0-239 (2011)