vba 计算Excel字符串中单词出现的频率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21858874/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Counting the Frequencies of Words in Excel Strings
提问by 114
Suppose I have a column of arbitrary length where each cell contains a string of text. Is there a way to determine what words appear most frequently in the column (not knowing in advance which words to check) and subsequently order these words along with their frequencies in a two column table? Would VBA be best for this task?
假设我有一列任意长度,其中每个单元格都包含一串文本。有没有办法确定列中最常出现的单词(事先不知道要检查哪些单词),然后在两列表中对这些单词及其频率进行排序?VBA 最适合这项任务吗?
As an example, a cell might contain the string "This is a string, and the # of characters inthis string is>0." (errors intentional)
例如,一个单元格可能包含字符串“This is a string, and the # of characters in this string is>0”。(故意错误)
回答by Gary's Student
Select a portion of column Aand run this small macro ( the table will be placed in cols. B& C:
选择A列的一部分并运行这个小宏(该表将放置在 cols. B& C 中:
Sub Ftable()
Dim BigString As String, I As Long, J As Long, K As Long
BigString = ""
' Add code to sum both "All" and "all" ' Add code to separate "." "!" etc. from the word preceeding them so that word ' is also counted in the total. For example: "all." should not be reported as 1 ' "all." but "all" be added to the total count of "all" words. ' Would you publish this new code?
' 添加代码以求和 "All" 和 "all" ' 添加代码以分隔 "." “!” 等从它们前面的单词开始,以便单词 ' 也计入总数。例如:“全部”。不应报告为 1 '“全部”。但是“所有”被添加到“所有”字的总数中。' 你会发布这个新代码吗?
For Each r In Selection
BigString = BigString & " " & r.Value
Next r
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next
cl.Add a, CStr(a)
Next a
For I = 1 To cl.Count
v = cl(I)
Cells(I, "B").Value = v
J = 0
For Each a In ary
If a = v Then J = J + 1
Next a
Cells(I, "C") = J
Next I
End Sub
回答by Jerome Montino
Given this:
鉴于这种:
I'll use a pivot table to get this:
我将使用数据透视表来得到这个:
Best part is, if I got more, it's easy to get Top 5, 10, etc. And it'll always result to unique indices. From there, there are all manners of editing and calculation you can do. :)
最好的部分是,如果我得到更多,很容易获得前 5、10 等。而且它总是会产生唯一的索引。从那里,您可以进行各种编辑和计算。:)
回答by Marston Gould
Using Google Sheets:
使用 Google 表格:
index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),1,$A6+1)&":"&index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),2,$A6+1)
In the above $B$2 contains the text string
在上面的 $B$2 包含文本字符串
$A6 = 1 will give you the most used word
$A6 = 1 会给你最常用的词
$A6 = 2 will give you the second most used word etc.
$A6 = 2 会给你第二个最常用的词等。
This is set to do 20 most frequent. If you want more, increase the limit value to whatever you want.
这个设置做20次最频繁。如果您想要更多,请将限制值增加到您想要的任何值。
回答by Stephen McNutt
Here's a tiny fix plus an enhancement to the script kindly offered by "Gary's Student". The fix is that while building the collection is apparently not case-sensitive (and this is correct--we probably don't want new items added to the collection that differ only in case from existing items), the IF statement that does the counting IS case-sensitive as written, so it doesn't count correctly. Just change that line to...
这是“加里的学生”好心提供的脚本的一个小修复和增强功能。解决方法是,虽然构建集合显然不区分大小写(这是正确的 - 我们可能不希望将新项目添加到集合中,这些新项目仅与现有项目的大小写不同),执行计数的 IF 语句写入时区分大小写,因此计数不正确。只需将该行更改为...
If LCase(a) = LCase(v) Then J = J + 1
And here's my enhancement. To use it, you first select one or more columns but NOT their (first) header/label rows. Then run the script, and it gives results for each selected column in a new worksheet--along with that header/label row so you know what you're looking at.
这是我的增强。要使用它,您首先选择一个或多个列,而不是它们的(第一个)标题/标签行。然后运行该脚本,它会为新工作表中的每个选定列提供结果 - 以及该标题/标签行,以便您了解正在查看的内容。
I'm just a dabbler. I just hack stuff when I need to get a job done, so it's not elegant, I'm sure...
我只是个小白。当我需要完成工作时,我只会破解一些东西,所以它不优雅,我敢肯定......
Sub FrequencyV2() 'Modified from: https://stackoverflow.com/questions/21858874/counting-the-frequencies-of-words-in-excel-strings
'It determines the frequency of words found in each selected column.
'Puts results in new worksheets.
'Before running, select one or more columns but not the header rows.
Dim rng As Range
Dim row As Range
Dim col As Range
Dim cell As Range
Dim ws As Worksheet
Dim wsNumber As Long 'Used to put a number in the names of the newly created worksheets
wsNumber = 1
Set rng = Selection
For Each col In rng.Columns
Dim BigString As String, I As Long, J As Long, K As Long
BigString = ""
For Each cell In col.Cells
BigString = BigString & " " & cell.Value
Next cell
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next 'This works because an error occurs if item already exists in the collection.
'Note that it's not case sensitive. Differently capitalized items will be identified as already belonging to collection.
cl.Add a, CStr(a)
Next a
Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
ws.Name = "F" & CStr(wsNumber)
wsNumber = wsNumber + 1
Worksheets(ws.Name).Cells(1, "A").Value = col.Cells(1, 1).Offset(-1, 0).Value 'Copies the table header text for current column to new worksheet.
For I = 1 To cl.Count
v = cl(I)
Worksheets(ws.Name).Cells(I + 1, "A").Value = v 'The +1 needed because header text takes up row 1.
J = 0
For Each a In ary
If LCase(a) = LCase(v) Then J = J + 1
Next a
Worksheets(ws.Name).Cells(I + 1, "B") = J 'The +1 needed because header text takes up row 1.
Next I
Next col
End Sub