包含常见中文文本处理的Python库:Zhon

jopen 9年前

Zhon这个Python库提供了常用汉字常量,如CJK字符和偏旁,中文标点,拼音,和汉字正则表达式(如找到文本中的繁体字):

  • CJK字符和偏旁
  • Chinese punctuation marks
  • Chinese sentence regular expression pattern
  • Pinyin vowels, consonants, lowercase, uppercase, and punctuation
  • Pinyin syllable, word, and sentence regular expression patterns
  • Zhuyin characters and marks
  • Zhuyin syllable regular expression pattern
  • CC-CEDICT characters

>>> re.findall(zhon.hanzi.sentence, '我买了一辆车。妈妈做的菜,很好吃!')  ['我买了一辆车。', '妈妈做的菜,很好吃!']

项目主页:http://www.open-open.com/lib/view/home/1418367924902