vba 计算Excel字符串中单词出现的频率

Question

提问by 114

Suppose I have a column of arbitrary length where each cell contains a string of text. Is there a way to determine what words appear most frequently in the column (not knowing in advance which words to check) and subsequently order these words along with their frequencies in a two column table? Would VBA be best for this task?

假设我有一列任意长度，其中每个单元格都包含一串文本。有没有办法确定列中最常出现的单词（事先不知道要检查哪些单词），然后在两列表中对这些单词及其频率进行排序？VBA 最适合这项任务吗？

As an example, a cell might contain the string "This is a string, and the # of characters inthis string is>0." (errors intentional)

例如，一个单元格可能包含字符串“This is a string, and the # of characters in this string is>0”。（故意错误）

Answer 1

回答by Gary's Student

Select a portion of column Aand run this small macro ( the table will be placed in cols. B& C:

选择A列的一部分并运行这个小宏（该表将放置在 cols. B& C 中：

Sub Ftable()
    Dim BigString As String, I As Long, J As Long, K As Long
    BigString = ""

' Add code to sum both "All" and "all" ' Add code to separate "." "!" etc. from the word preceeding them so that word ' is also counted in the total. For example: "all." should not be reported as 1 ' "all." but "all" be added to the total count of "all" words. ' Would you publish this new code?

' 添加代码以求和 "All" 和 "all" ' 添加代码以分隔 "." “！” 等从它们前面的单词开始，以便单词 ' 也计入总数。例如：“全部”。不应报告为 1 '“全部”。但是“所有”被添加到“所有”字的总数中。' 你会发布这个新代码吗？

    For Each r In Selection 
          BigString = BigString & " " & r.Value
    Next r
    BigString = Trim(BigString)
    ary = Split(BigString, " ")
    Dim cl As Collection
    Set cl = New Collection
    For Each a In ary
        On Error Resume Next
        cl.Add a, CStr(a)
    Next a

    For I = 1 To cl.Count
        v = cl(I)
        Cells(I, "B").Value = v
        J = 0
        For Each a In ary
            If a = v Then J = J + 1
        Next a
        Cells(I, "C") = J
    Next I

End Sub

Answer 2

回答by Jerome Montino

Given this:

鉴于这种：

enter image description here

在此处输入图片说明

I'll use a pivot table to get this:

我将使用数据透视表来得到这个：

enter image description here

在此处输入图片说明

Best part is, if I got more, it's easy to get Top 5, 10, etc. And it'll always result to unique indices. From there, there are all manners of editing and calculation you can do. :)

最好的部分是，如果我得到更多，很容易获得前 5、10 等。而且它总是会产生唯一的索引。从那里，您可以进行各种编辑和计算。:)

Answer 3

回答by Marston Gould

Using Google Sheets:

使用 Google 表格：

index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),1,$A6+1)&":"&index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),2,$A6+1)

In the above $B$2 contains the text string

在上面的 $B$2 包含文本字符串

$A6 = 1 will give you the most used word

$A6 = 1 会给你最常用的词

$A6 = 2 will give you the second most used word etc.

$A6 = 2 会给你第二个最常用的词等。

This is set to do 20 most frequent. If you want more, increase the limit value to whatever you want.

这个设置做20次最频繁。如果您想要更多，请将限制值增加到您想要的任何值。

Answer 4

回答by Stephen McNutt

Here's a tiny fix plus an enhancement to the script kindly offered by "Gary's Student". The fix is that while building the collection is apparently not case-sensitive (and this is correct--we probably don't want new items added to the collection that differ only in case from existing items), the IF statement that does the counting IS case-sensitive as written, so it doesn't count correctly. Just change that line to...

这是“加里的学生”好心提供的脚本的一个小修复和增强功能。解决方法是，虽然构建集合显然不区分大小写（这是正确的 - 我们可能不希望将新项目添加到集合中，这些新项目仅与现有项目的大小写不同），执行计数的 IF 语句写入时区分大小写，因此计数不正确。只需将该行更改为...

If LCase(a) = LCase(v) Then J = J + 1

And here's my enhancement. To use it, you first select one or more columns but NOT their (first) header/label rows. Then run the script, and it gives results for each selected column in a new worksheet--along with that header/label row so you know what you're looking at.

这是我的增强。要使用它，您首先选择一个或多个列，而不是它们的（第一个）标题/标签行。然后运行该脚本，它会为新工作表中的每个选定列提供结果 - 以及该标题/标签行，以便您了解正在查看的内容。

I'm just a dabbler. I just hack stuff when I need to get a job done, so it's not elegant, I'm sure...

我只是个小白。当我需要完成工作时，我只会破解一些东西，所以它不优雅，我敢肯定......

Sub FrequencyV2() 'Modified from: https://stackoverflow.com/questions/21858874/counting-the-frequencies-of-words-in-excel-strings
'It determines the frequency of words found in each selected column.
'Puts results in new worksheets.
'Before running, select one or more columns but not the header rows.
    Dim rng As Range
    Dim row As Range
    Dim col As Range
    Dim cell As Range
    Dim ws As Worksheet
    Dim wsNumber As Long 'Used to put a number in the names of the newly created worksheets
    wsNumber = 1
    Set rng = Selection
    For Each col In rng.Columns
        Dim BigString As String, I As Long, J As Long, K As Long
        BigString = ""
        For Each cell In col.Cells
            BigString = BigString & " " & cell.Value
        Next cell
        BigString = Trim(BigString)
        ary = Split(BigString, " ")
        Dim cl As Collection
        Set cl = New Collection
        For Each a In ary
            On Error Resume Next 'This works because an error occurs if item already exists in the collection.
            'Note that it's not case sensitive.  Differently capitalized items will be identified as already belonging to collection.
            cl.Add a, CStr(a)
        Next a
        Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
        ws.Name = "F" & CStr(wsNumber)
        wsNumber = wsNumber + 1
        Worksheets(ws.Name).Cells(1, "A").Value = col.Cells(1, 1).Offset(-1, 0).Value 'Copies the table header text for current column to new worksheet.
        For I = 1 To cl.Count
            v = cl(I)
            Worksheets(ws.Name).Cells(I + 1, "A").Value = v 'The +1 needed because header text takes up row 1.
            J = 0
            For Each a In ary
                If LCase(a) = LCase(v) Then J = J + 1
            Next a
            Worksheets(ws.Name).Cells(I + 1, "B") = J 'The +1 needed because header text takes up row 1.
        Next I
    Next col
End Sub

vba 计算Excel字符串中单词出现的频率

提问by 114

回答by Gary's Student

回答by Jerome Montino

回答by Marston Gould

回答by Stephen McNutt

相关推荐

最近更新

标签

vba 计算Excel字符串中单词出现的频率

提问by 114

回答by Gary's Student

回答by Jerome Montino

回答by Marston Gould

回答by Stephen McNutt

相关推荐

vba 如何在powerpoint 2013中录制宏

vba Excel ScreenUpdating False 并且屏幕仍然闪烁

laravel npm run watch 和 npm run watch-poll 的区别

vba Access 2010：从子表单中选择的记录值

相关推荐

最近更新

标签