我使用readChar()函数将文本读入R中.我的目的是检验这样一个假设,即文本的句子出现的字母“a”与字母“b”的出现次数一样多.我最近发现了{stringr}包,这对我的文本做了很多有用的事情,比如计算整个文本中的字符数和每个字母的出现次数.现在,我需要知道整篇文章中的句子数量. R有任何功能,可以帮我做到吗?非常感谢你!
解决方法
谢谢@ gui11aume的回答.我刚刚发现的一个非常好的包可以帮助完成工作{openNLP}.这是执行此操作的代码:
install.packages("openNLP") ## Installs the required natural language processing (NLP) package install.packages("openNLPmodels.en") ## Installs the model files for the English language library(openNLP) ## Loads the package for use in the task library(openNLPmodels.en) ## Loads the model files for the English language text = "Dr. Brown and Mrs. Theresa will be away from a very long time!!! I can't wait to see them again." ## This sentence has unusual punctuation as suggested by @gui11aume x = sentDetect(text,language = "en") ## sentDetect() is the function to use. It detects and seperates sentences in a text. The first argument is the string vector (or text) and the second argument is the language. x ## Displays the different sentences in the string vector (or text). [1] "Dr. Brown and Mrs. Theresa will be away from a very long time!!! " [2] "I can't wait to see them again." length(x) ## Displays the number of sentences in the string vector (or text). [1] 2
{openNLP}包非常适合R中的自然语言处理,你可以找到它的简短介绍here,或者你可以查看软件包的文档here.
> {openNLPmodels.es}为西班牙语> {openNLPmodels.ge}为德语> {openNLPmodels.th}泰语