Adaptive compression-based models of Chinese text

W.J. Teahan; P. Wu; W. Liu

doi:10.1109/ICALIP.2014.7009920

Adaptive compression-based models of Chinese text

Research output: Contribution to conference › Paper

Standard Standard

Adaptive compression-based models of Chinese text. / Teahan, W.J.; Wu, P.; Liu, W.
2014. 874-881 Paper presented at International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China.

Research output: Contribution to conference › Paper

HarvardHarvard

Teahan, WJ, Wu, P & Liu, W 2014, 'Adaptive compression-based models of Chinese text', Paper presented at International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China, 3/01/01 pp. 874-881. https://doi.org/10.1109/ICALIP.2014.7009920

APA

Teahan, W. J., Wu, P., & Liu, W. (2014). Adaptive compression-based models of Chinese text. 874-881. Paper presented at International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China. https://doi.org/10.1109/ICALIP.2014.7009920

CBE

Teahan WJ, Wu P, Liu W. 2014. Adaptive compression-based models of Chinese text. Paper presented at International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China. https://doi.org/10.1109/ICALIP.2014.7009920

MLA

Teahan, W.J., P. Wu and W. Liu Adaptive compression-based models of Chinese text. International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China, 03 Jan 0001, Paper, 2014. https://doi.org/10.1109/ICALIP.2014.7009920

VancouverVancouver

Teahan WJ, Wu P, Liu W. Adaptive compression-based models of Chinese text. 2014. Paper presented at International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China. doi: 10.1109/ICALIP.2014.7009920

Author

Teahan, W.J. ; Wu, P. ; Liu, W. / Adaptive compression-based models of Chinese text. Paper presented at International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China.

RIS

TY - CONF

T1 - Adaptive compression-based models of Chinese text

AU - Teahan, W.J.

AU - Wu, P.

AU - Liu, W.

PY - 2014/7/7

Y1 - 2014/7/7

N2 - Large alphabet languages such as Chinese present different problems for language modelling compared to small alphabet languages such as English. In this paper, we describe adaptive models of Chinese text based on the Partial Predictive Match (PPM) text compression scheme that learns the language as the text is processed sequentially. We describe several character-based, word-based and part-of-speech (POS) based variants of PPM that achieve significant improvements in compression rate over existing models. Interestingly, results for Chinese text contrast that achieved for English text, with character-based models outperforming the word and POS based models rather than the other way round. We then explore how well these models perform at the task of Chinese word segmentation.

AB - Large alphabet languages such as Chinese present different problems for language modelling compared to small alphabet languages such as English. In this paper, we describe adaptive models of Chinese text based on the Partial Predictive Match (PPM) text compression scheme that learns the language as the text is processed sequentially. We describe several character-based, word-based and part-of-speech (POS) based variants of PPM that achieve significant improvements in compression rate over existing models. Interestingly, results for Chinese text contrast that achieved for English text, with character-based models outperforming the word and POS based models rather than the other way round. We then explore how well these models perform at the task of Chinese word segmentation.

U2 - 10.1109/ICALIP.2014.7009920

DO - 10.1109/ICALIP.2014.7009920

M3 - Paper

SP - 874

EP - 881

T2 - International Conference on Audio, Language and Image Processing (ICALIP), 7 - 9 July 2014, Shanghai, China

Y2 - 3 January 0001

ER -

Research Portal

Adaptive compression-based models of Chinese text

Standard Standard

HarvardHarvard

APA

CBE

MLA

VancouverVancouver

Author

RIS