The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.
Published in | International Journal of Language and Linguistics (Volume 1, Issue 1) |
DOI | 10.11648/j.ijll.20130101.11 |
Page(s) | 1-6 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2013. Published by Science Publishing Group |
Linguistics, Natural Language Processing, Corpus, Turkish, Morphological Analysis, Sentence Boundary Detection
[1] | Z. Güngördü, "A lexical-functional grammar for Turkish", MSc Thesis, Computer Engineering Department, Bilkent University, Ankara-Turkey, 1993. |
[2] | C. E. Shannon, "Prediction and Entropy of Printed English", The Bell System Technical Journal, vol. 30:1, pp. 50-64, 1951. |
[3] | D. Crystal, A Dictionary of Linguistics and Phonetics, 3rd Edition, Blackwell, 1991. |
[4] | J. Sinclair, "Corpus Concordance", Collocation, OUP, 1991. |
[5] | Varliklar, O. Developing a Method to Determine Root and Suffixes for Turkish Words to Generate Large Scale Turkish Corpus. M.Sc. Thesis, Dokuz Eylul University Graduate School of Natural and Applied Sciences Computer Engineering Department, Izmir - Turkey, 2005. |
[6] | Boye, J. "XML, What’s in it for us?", article published in www.irt.org, 1998. |
[7] | Ş. H. Akalın, R. Toparlı, Yazım Kılavuzu, Türk Dil Kurumu Yayınları, 24th Edition, Ankara, 2005. |
[8] | T. Kiss, J. Strunk, "Unsupervised Multilingual Sentence B oundary Detection", Computational Linguistics vol. 32:4 pp,. 485-525, 2006. |
[9] | B. Say, D. Zeyrek, K. Oflazer, U. Ozge, "Development of a Corpus and a Treebank for Present-day Written Turkish", Proceedings of the Eleventh International Conference of Turkish Linguistics, ICTL, Ankara, Turkey, 2002. |
[10] | T. Dinçer, B. Karaoğlan, "Sentence Boundary Detection in Turkish", Advances in Information Systems Proceedings: Third International Conference, Izmir-Turkey, pp. 255, 2004. |
APA Style
Özlem AKTAŞ, Yalçın ÇEBİ. (2013). Rule-Based Sentence Detection Method (RBSDM) for Turkish. International Journal of Language and Linguistics, 1(1), 1-6. https://doi.org/10.11648/j.ijll.20130101.11
ACS Style
Özlem AKTAŞ; Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int. J. Lang. Linguist. 2013, 1(1), 1-6. doi: 10.11648/j.ijll.20130101.11
AMA Style
Özlem AKTAŞ, Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int J Lang Linguist. 2013;1(1):1-6. doi: 10.11648/j.ijll.20130101.11
@article{10.11648/j.ijll.20130101.11, author = {Özlem AKTAŞ and Yalçın ÇEBİ}, title = {Rule-Based Sentence Detection Method (RBSDM) for Turkish}, journal = {International Journal of Language and Linguistics}, volume = {1}, number = {1}, pages = {1-6}, doi = {10.11648/j.ijll.20130101.11}, url = {https://doi.org/10.11648/j.ijll.20130101.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijll.20130101.11}, abstract = {The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.}, year = {2013} }
TY - JOUR T1 - Rule-Based Sentence Detection Method (RBSDM) for Turkish AU - Özlem AKTAŞ AU - Yalçın ÇEBİ Y1 - 2013/05/02 PY - 2013 N1 - https://doi.org/10.11648/j.ijll.20130101.11 DO - 10.11648/j.ijll.20130101.11 T2 - International Journal of Language and Linguistics JF - International Journal of Language and Linguistics JO - International Journal of Language and Linguistics SP - 1 EP - 6 PB - Science Publishing Group SN - 2330-0221 UR - https://doi.org/10.11648/j.ijll.20130101.11 AB - The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%. VL - 1 IS - 1 ER -