如何在python中计算段落中的句子数量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15228054/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to count the amount of sentences in a paragraph in python
提问by Fiona Gaughan
this is what i have so far but my pparagraph only contains 5 full stops therefore only 5 sentences.but it keeps on returning 14 as a answer. can anyone help??
这是我到目前为止所拥有的,但我的段落只包含 5 个句号,因此只有 5 个句子。但它一直返回 14 作为答案。有人可以帮忙吗??
file = open ('words.txt', 'r')
lines= list (file)
file_contents = file.read()
print(lines)
file.close()
words_all = 0
for line in lines:
words_all = words_all + len(line.split())
print ('Total words: ', words_all)
full_stops = 0
for stop in lines:
full_stops = full_stops + len(stop.split('.'))
print ('total stops: ', full_stops)
here is the txt file
这是txt文件
A Turning machine is a device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The "Turing" machine was described by Alan Turing in 1936, who called it an "a(utomatic)-machine". The Turing machine is not intended as a practical computing technology, but rather as a hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechaniacl computation.
车床是一种根据规则表操纵带子上的符号的设备。尽管图灵机很简单,但它可以适用于模拟任何计算机算法的逻辑,尤其适用于解释计算机内部 CPU 的功能。“图灵”机器是由艾伦·图灵在 1936 年描述的,他称其为“a(自动)机器”。图灵机并不是一种实用的计算技术,而是一种代表计算机的假设设备。图灵机帮助计算机科学家了解机械计算的局限性。
采纳答案by Pavel Anossov
If a line doesn't contain a period, splitwill return a single element: the line itself:
如果一行不包含句点,split将返回一个元素:行本身:
>>> "asdasd".split('.')
['asdasd']
So you're counting the number of lines plus the number of periods. Why are you splitting the file to lines at all?
因此,您正在计算行数加上句点数。你为什么要把文件分成几行?
with open('words.txt', 'r') as file:
file_contents = file.read()
print('Total words: ', len(file_contents.split()))
print('total stops: ', file_contents.count('.'))
回答by ATOzTOA
Try
尝试
print "total stops: ", open('words.txt', 'r').read().count(".")
Details:
细节:
with open("words.txt") as f:
data = f.read()
print "total stops: ", data.count(".")
回答by reptilicus
Use regex.
使用正则表达式。
In [13]: import re
In [14]: par = "This is a paragraph? So it is! Ok, there are 3 sentences."
In [15]: re.split(r'[.!?]+', par)
Out[15]: ['This is a paragraph', ' So it is', ' Ok, there are 3 sentences', '']
回答by Preetkaran Singh
Easiest way of doing it would be:
最简单的方法是:
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
sentences = 'A Turning machine is a device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The "Turing" machine was described by Alan Turing in 1936, who called it an "a(utomatic)-machine". The Turing machine is not intended as a practical computing technology, but rather as a hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechaniacl computation.'
number_of_sentences = sent_tokenize(sentences)
print(len(number_of_sentences))
Output:
输出:
5

![python 套接字编程 OSError: [WinError 10038] 尝试对不是套接字的东西进行操作](/res/img/loading.gif)