java 用于语言查询和字数统计的开源库 (LIWC)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2511876/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Open Source Library for Linguistic Inquiry and Word Count (LIWC)
提问by zfranciscus
回答by Turadg
As ealdentpoints out, LIWC is both software and a data set. The data set is proprietary so there is no open-source version of that. For the software side, TAWCis a useful open source Perl version. From the comments:
正如ealdent指出的那样,LIWC 既是软件也是数据集。该数据集是专有的,因此没有开源版本。对于软件方面,TAWC是一个有用的开源 Perl 版本。来自评论:
This is a semi-complicated script adapted from the one used in my CHI papers. The task of this script is to read in regular expressions from a dictionary (or if they're not REs, to make them into REs), which must be backwards-compatible with the LIWC software set (c.f. http://www.liwc.net). It then counts the number of matches for the RE in a single input row / user, and outputs it for that row / user.
这是一个半复杂的脚本,改编自我在 CHI 论文中使用的脚本。此脚本的任务是从字典中读取正则表达式(或者如果它们不是 RE,则将它们转换为 RE),它必须向后兼容 LIWC 软件集(参见 http://www.liwc .net)。然后计算单个输入行/用户中 RE 的匹配数,并为该行/用户输出它。
You could then buy LIWClitewhich is less than half the cost of LIWC. You can also use TAWC with your own dictionaries for free.
然后,您可以购买不到 LIWC一半成本的 LIWClite。您还可以免费将 TAWC 与您自己的词典一起使用。
回答by rlotun
You may find the Natural Language Toolkit (NLTK) for Python useful: http://www.nltk.org/
您可能会发现 Python 的自然语言工具包 (NLTK) 很有用:http: //www.nltk.org/

