Rule-Based Sentence Detection Method (RBSDM) for Turkish

Özlem AKTAŞ; Yalçın ÇEBİ

doi:doi:10.11648/j.ijll.20130101.11

| Peer-Reviewed

Rule-Based Sentence Detection Method (RBSDM) for Turkish

Özlem AKTAŞ, Yalçın ÇEBİ

Published in International Journal of Language and Linguistics (Volume 1, Issue 1)

Received: 20 March 2013 Published: 2 May 2013

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.

Published in	International Journal of Language and Linguistics (Volume 1, Issue 1)
DOI	10.11648/j.ijll.20130101.11
Page(s)	1-6
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2013. Published by Science Publishing Group

Keywords

Linguistics, Natural Language Processing, Corpus, Turkish, Morphological Analysis, Sentence Boundary Detection

References

[1]	Z. Güngördü, "A lexical-functional grammar for Turkish", MSc Thesis, Computer Engineering Department, Bilkent University, Ankara-Turkey, 1993.
[2]	C. E. Shannon, "Prediction and Entropy of Printed English", The Bell System Technical Journal, vol. 30:1, pp. 50-64, 1951.
[3]	D. Crystal, A Dictionary of Linguistics and Phonetics, 3rd Edition, Blackwell, 1991.
[4]	J. Sinclair, "Corpus Concordance", Collocation, OUP, 1991.
[5]	Varliklar, O. Developing a Method to Determine Root and Suffixes for Turkish Words to Generate Large Scale Turkish Corpus. M.Sc. Thesis, Dokuz Eylul University Graduate School of Natural and Applied Sciences Computer Engineering Department, Izmir - Turkey, 2005.
[6]	Boye, J. "XML, What’s in it for us?", article published in www.irt.org, 1998.
[7]	Ş. H. Akalın, R. Toparlı, Yazım Kılavuzu, Türk Dil Kurumu Yayınları, 24th Edition, Ankara, 2005.
[8]	T. Kiss, J. Strunk, "Unsupervised Multilingual Sentence B oundary Detection", Computational Linguistics vol. 32:4 pp,. 485-525, 2006.
[9]	B. Say, D. Zeyrek, K. Oflazer, U. Ozge, "Development of a Corpus and a Treebank for Present-day Written Turkish", Proceedings of the Eleventh International Conference of Turkish Linguistics, ICTL, Ankara, Turkey, 2002.
[10]	T. Dinçer, B. Karaoğlan, "Sentence Boundary Detection in Turkish", Advances in Information Systems Proceedings: Third International Conference, Izmir-Turkey, pp. 255, 2004.

Cite This Article

Plain Text BibTeX RIS

APA Style

Özlem AKTAŞ, Yalçın ÇEBİ. (2013). Rule-Based Sentence Detection Method (RBSDM) for Turkish. International Journal of Language and Linguistics, 1(1), 1-6. https://doi.org/10.11648/j.ijll.20130101.11

Copy | Download

ACS Style

Özlem AKTAŞ; Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int. J. Lang. Linguist. 2013, 1(1), 1-6. doi: 10.11648/j.ijll.20130101.11

Copy | Download

AMA Style

Özlem AKTAŞ, Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int J Lang Linguist. 2013;1(1):1-6. doi: 10.11648/j.ijll.20130101.11

Copy | Download

@article{10.11648/j.ijll.20130101.11,
  author = {Özlem AKTAŞ and Yalçın ÇEBİ},
  title = {Rule-Based Sentence Detection Method (RBSDM) for Turkish},
  journal = {International Journal of Language and Linguistics},
  volume = {1},
  number = {1},
  pages = {1-6},
  doi = {10.11648/j.ijll.20130101.11},
  url = {https://doi.org/10.11648/j.ijll.20130101.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijll.20130101.11},
  abstract = {The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.},
 year = {2013}
}

Copy | Download

TY  - JOUR
T1  - Rule-Based Sentence Detection Method (RBSDM) for Turkish
AU  - Özlem AKTAŞ
AU  - Yalçın ÇEBİ
Y1  - 2013/05/02
PY  - 2013
N1  - https://doi.org/10.11648/j.ijll.20130101.11
DO  - 10.11648/j.ijll.20130101.11
T2  - International Journal of Language and Linguistics
JF  - International Journal of Language and Linguistics
JO  - International Journal of Language and Linguistics
SP  - 1
EP  - 6
PB  - Science Publishing Group
SN  - 2330-0221
UR  - https://doi.org/10.11648/j.ijll.20130101.11
AB  - The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.
VL  - 1
IS  - 1
ER  -

Copy | Download

Author Information

Özlem AKTAŞ
Yalçın ÇEBİ

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Özlem AKTAŞ, Yalçın ÇEBİ. (2013). Rule-Based Sentence Detection Method (RBSDM) for Turkish. International Journal of Language and Linguistics, 1(1), 1-6. https://doi.org/10.11648/j.ijll.20130101.11

Copy | Download

ACS Style

Özlem AKTAŞ; Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int. J. Lang. Linguist. 2013, 1(1), 1-6. doi: 10.11648/j.ijll.20130101.11

Copy | Download

AMA Style

Özlem AKTAŞ, Yalçın ÇEBİ. Rule-Based Sentence Detection Method (RBSDM) for Turkish. Int J Lang Linguist. 2013;1(1):1-6. doi: 10.11648/j.ijll.20130101.11

Copy | Download

@article{10.11648/j.ijll.20130101.11,
  author = {Özlem AKTAŞ and Yalçın ÇEBİ},
  title = {Rule-Based Sentence Detection Method (RBSDM) for Turkish},
  journal = {International Journal of Language and Linguistics},
  volume = {1},
  number = {1},
  pages = {1-6},
  doi = {10.11648/j.ijll.20130101.11},
  url = {https://doi.org/10.11648/j.ijll.20130101.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijll.20130101.11},
  abstract = {The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.},
 year = {2013}
}

Copy | Download

TY  - JOUR
T1  - Rule-Based Sentence Detection Method (RBSDM) for Turkish
AU  - Özlem AKTAŞ
AU  - Yalçın ÇEBİ
Y1  - 2013/05/02
PY  - 2013
N1  - https://doi.org/10.11648/j.ijll.20130101.11
DO  - 10.11648/j.ijll.20130101.11
T2  - International Journal of Language and Linguistics
JF  - International Journal of Language and Linguistics
JO  - International Journal of Language and Linguistics
SP  - 1
EP  - 6
PB  - Science Publishing Group
SN  - 2330-0221
UR  - https://doi.org/10.11648/j.ijll.20130101.11
AB  - The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.
VL  - 1
IS  - 1
ER  -

Copy | Download