对文本进行情感分析:TextBlob

jopen 10年前

TextBlob是一个用 Python (2和3)编写的开源的文本处理库。它可以用来执行很多自然语言处理的任务,比如,词性标注,名词性成分提取,情感分析,文本翻译,等等。你可以在官方文档阅读TextBlog的所有特性。

为什么我要关心TextBlob?

我学习TextBlob的原因如下:

  1. 我想开发需要进行文本处理的应用。我们给应用添加文本处理功能之后,应用能更好地理解人们的行为,因而显得更加人性化。文本处理很难做对。TextBlob站在巨人的肩膀上(NTLK),NLTK是创建处理自然语言的Python程序的最佳选择。

  2. 我想学习下如何用 Python 进行文本处理。

from textblob import TextBlob    text = '''  The titular threat of The Blob has always struck me as the ultimate movie  monster: an insatiably hungry, amoeba-like mass able to penetrate  virtually any safeguard, capable of--as a doomed doctor chillingly  describes it--"assimilating flesh on contact.  Snide comparisons to gelatin be damned, it's a concept with the most  devastating of potential consequences, not unlike the grey goo scenario  proposed by technological theorists fearful of  artificial intelligence run rampant.  '''    blob = TextBlob(text)  blob.tags           # [(u'The', u'DT'), (u'titular', u'JJ'),                      #  (u'threat', u'NN'), (u'of', u'IN'), ...]    blob.noun_phrases   # WordList(['titular threat', 'blob',                      #            'ultimate movie monster',                      #            'amoeba-like mass', ...])    for sentence in blob.sentences:      print(sentence.sentiment.polarity)  # 0.060  # -0.341    blob.translate(to="es")  # 'La amenaza titular de The Blob...'

特性

  • Noun phrase extraction
  • Part-of-speech tagging
  • Sentiment analysis
  • Classification (Naive Bayes, Decision Tree)
  • Language translation and detection powered by Google Translate
  • Tokenization (splitting text into words and sentences)
  • Word and phrase frequencies
  • Parsing
  • n-grams
  • Word inflection (pluralization and singularization) and lemmatization
  • Spelling correction
  • Add new models or languages through extensions
  • WordNet integration

项目主页:http://www.open-open.com/lib/view/home/1408694649350