Applied Technologies and Innovations

  Previous Article | Back to Volume | Next Article
  Abstract | References | Citation | Download | Preview | Statistics
Volume 8
Issue 3
Online publication date 2012-11-15
Title Classification of text documents supervised by domain ontologies
Author Anna Rozeva
Abstract The research objective is to establish an approach for supporting the classification of text documents referring to a specified domain. The focus is on the preliminary topic assignment to the documents used for training the model. The method implements domain ontology as background knowledge. The idea consists in extracting the preliminary topics for training the classifier by means of unsupervised machine learning on a text corpus and further alignment of the document vectors to concepts of the ontology. The results obtained by classification of new documents supervised by e-governance ontology with several machine learning algorithms showed sufficient match of their content to the ontology concepts. A conclusion is drawn that the approach can support the automatic extraction of documents relevant to any domain described by ontology.   
Citation
References
Alldrin, N., Smity, A., Turnbull, D., 2003. “Clustering with EM and K-Means,” [Online], Available: http://cseweb.ucsd.edu/~atsmith/project1_253.pdf [6 September 2012]

Apostolou, D., Stojanovic, L., Lobo, T., Miro, J., Papadakis, A., 2005. “Configuring e-government services using ontologies,” IFIP International Federation for Information Processing, Vol.189, pp.141-55

Bloehdorn, S., Cimiano, P., Hotho, A., 2006. “Learning ontologies to improve text clustering and classification,” From Data and Information Analysis to Knowledge Engineering: Proceedings of the 29th Annual Conference of the German Classification Society GfKl, Magdeburg, Germany, March 9-11, Vol.30, Germany, Berlin-Heidelberg:Springer, pp.334-41

Deliyska, B., Ilieva, R., 2011. “Ontology-based model of e-governance,” Annual of Section Informatics of the Union of Bulgarian Scientists, Vol.4, pp.103-19

de Melo, G., Siersdorfer, S., 2007. “Multilingual text classification using ontologies,” ECIR'07 Proceedings of the 29th European conference on IR research, in: Amati, G., Carpineto, C., Romano, G. (Eds.), Lecture Notes in Computer Science, Springer-Verlag Berlin, Vol.4425, Heidelberg, Germany, pp.541-48

Fonou-Dombeu, J., Huisman, M., 2011. “Combining ontology development methodologies and semantic web platforms for e-government domain ontology development,”  International Journal of Web & Semantic Technology (IJWesT), Vol.2(2), pp.12-25

He, Q., Qui, L., Zhao, G., Wang, S., 2004. “Text categorization based on domain ontology,” Web Information Systems - WISE 2004, in: Zhou, X. (Ed.), Lecture Notes in Computer Science”, Vol.3306, Springer Verlag Berlin Heidelberg, Germany, pp.319-24

Janik, M., Kochut, K., 2008a. “Training-less ontology-based text categorization,” ECIR Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’08), Glasgow, England, pp.3-17

Janik, M., Kochut, K., 2008b. “Wikipedia in action: Ontological knowledge in text categorization,” IEEE International Conference on Semantic Computing, pp.268-75

Rozeva, A., 2011. “Mining model for unstructured data,” Proceedings of the VI-th International Conference Computer Science'2011, Ohrid, Macedonia, September 1-3, pp.449-54

Rozeva, A., Ivanov, M., Tsankova, R., 2011. “Business modeling for generation of knowledge from explicit data,” in: Shishkov, B. (Ed.), Proceedings of First International Symposium on Business Modeling and Software Design, Sofia,Bulgaria, July 27-28, pp.114-21

Sarantis, D., Askounis, D., 2010. “Knowledge exploitation via ontlogy development in e-government project management,” International Journal of Digital Society (IJDS), Vol.1(4), pp.246-55

Seddiqui, M., Aono, M., 2009. “Ontology driven IPC based classification of a research abstract,” International MultiConference of Engineers and Computer Scientists (IMECS2009), Hong Kong, March 18-20, pp.692-97

Tsankova, R., 2010. “E-governance as a step of new public management,” Workshop “Public Administration in the Balkans-from Weberian Bureaucracy to the New Public Management”, “Jean Monnet” Project, “South-Eastern European Developments on the Administrative Convergence and Enlargement of the European Administrative Space in Balkan States”, Athens, Greece, February 5-6, pp.269-77

Tsankova,  R., Rozeva, A., 2011. “Generation of Knowledge from good practices as open government procedure,” in: Parycek, P., Kripp, M., Edelmann, N. (Eds.), Proceedings of the International Conference for E-Democracy and Open Government CeDEM'11, Krems, Austria, May 5-6, pp.209-19

Vassiliakis, C., Lepouras, G., 2006. “An ontology for e-government public services,” in: Khosrow-Pour, M. (Ed.), Encyclopedia of E-Commerce, E-Government and Mobile Commerce (2 volumes), IGI Global, pp.865-70

Vogrincic, S., Bosnic, Z., 2011. “Ontology-based multi-label classification of economic articles,” Computer Science and Information Systems, Vol.8(1), pp.101-19

Wu, S-H., Tsai, T-H., Hsu, W-L., 2003. “Text categorization using automatically acquired domain ontology,” AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages, Vol.11, Association for Computational Linguistics Stroudsburg, PA, USA, pp.138-45  

Keywords Text classification, Topic assignment, Supervised learning, Ontology, E-governance
DOI http://dx.doi.org/10.15208/ati.2012.11
Pages 1-12
Download Full PDF Download
  Previous Article | Back to Volume | Next Article
Share
Search in articles
Statistics
Journal Published articles
ATI 263
Journal Hits
ATI 698560
Journal Downloads
ATI 7514
Total users online -