获取 Pandas 中具有特定值的单元格的行和列

Question

提问by Gabriel

I am trying to read an Excel spreadsheet that is unformatted using Pandas. There are multiple tables within a single sheet and I want to convert these tables into dataframes. Since it is not already "indexed" in the traditional way, there are no meaningful column or row indices. Is there a way to search for a specific value and get the row, column where that is? For example, say I want to get a row, column number for all cells that contain the string "Title".

我正在尝试使用 Pandas 读取未格式化的 Excel 电子表格。一张表中有多个表，我想将这些表转换为数据框。由于它尚未以传统方式“编入索引”，因此没有有意义的列或行索引。有没有办法搜索特定值并获取行、列所在的位置？例如，假设我想获取包含字符串“Title”的所有单元格的行列号。

I have already tried things like DataFrame.filter but that only works if there are row and column indices.

我已经尝试过 DataFrame.filter 之类的东西，但只有在有行和列索引时才有效。

Answer 1

采纳答案by Yo_Chris

You can do some long and hard to read list comprehension:

你可以做一些长而难读的列表理解：

# assume this df and that we are looking for 'abc'
df = pd.DataFrame({'col':['abc', 'def','wert','abc'], 'col2':['asdf', 'abc', 'sdfg', 'def']})

[(df[col][df[col].eq('abc')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].eq('abc')].index))]

out:

出去：

[(0, 0), (3, 0), (1, 1)]

I should note that this is (index value, column location)

我应该注意到这是（索引值，列位置）

you can also change .eq()to str.contains()if you are looking for any strings that contains a certain value:

如果您要查找包含特定值的任何字符串，也可以更改.eq()为str.contains()：

[(df[col][df[col].str.contains('ab')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].str.contains('ab')].index))]

Answer 2

回答by firefly

Create a df with NaN where your_value is not found.
Drop all rows that don't contain the value.
Drop all columns that don't contain the value

使用 NaN 创建一个 df ，其中没有找到 your_value 。
删除所有不包含该值的行。
删除所有不包含值的列

a = df.where(df=='your_value').dropna(how='all').dropna(axis=1)

To get the row(s)

获取行

a.index

To get the columns(s)

获取列

a.columns

Answer 3

回答by meW

Here's an example to fetch all the row and column index of the cells containing word 'title' -

这是一个获取包含单词“title”的单元格的所有行和列索引的示例 -

df = pd.DataFrame({'A':['here goes the title', 'tt', 'we have title here'],
                  'B': ['ty', 'title', 'complex']})
df


+---+---------------------+---------+
|   |          A          |    B    |
+---+---------------------+---------+
| 0 | here goes the title | ty      |
| 1 | tt                  | title   |
| 2 | we have title here  | complex |
+---+---------------------+---------+


idx = df.apply(lambda x: x.str.contains('title'))

col_idx = []
for i in range(df.shape[1]):
    col_idx.append(df.iloc[:,i][idx.iloc[:,i]].index.tolist())


out = []
cnt = 0
for i in col_idx:
    for j in range(len(i)):
        out.append((i[j], cnt))
    cnt += 1
out

# [(0, 0), (2, 0), (1, 1)]   # Expected output

Answer 4

回答by Jakob

You can simply create a mask of the same shape than your df by calling df == 'title'. You can then combines this with the df.where()method, which will set all fields to NA that are different to your keyword, and finally you can use dropna()to reduce it to all valid fields. Then you can use the df.columnnsand df.indexlike you're use to.

您可以通过调用df == 'title'. 然后，您可以将其与df.where()方法结合使用，该方法会将与您的关键字不同的所有字段设置为 NA，最后您可以使用dropna()将其减少到所有有效字段。然后你可以使用df.columnns和df.index像你使用。

df = pd.DataFrame({"a": [0,1,2], "b": [0, 9, 7]})
print(df.where(df == 0).dropna().index)
print(df.where(df == 0).dropna().columns)

#Int64Index([0], dtype='int64')
#Index(['a', 'b'], dtype='object')

Answer 5

回答by Gabriel

Similar to what Chris said, I found this to work for me, although it's not the prettiest or shortest way. This returns all the row,column pairs matching a regular expression in a dataframe:

与 Chris 所说的类似，我发现这对我有用，尽管它不是最漂亮或最短的方法。这将返回与数据框中的正则表达式匹配的所有行、列对：

for row in df.itertuples():
    col_count = 0
    for col in row:
        if regex.match(str(col)):
            tuples.append((row_count, col_count))
            col_count+=1
        row_count+=1

return tuples

获取 Pandas 中具有特定值的单元格的行和列

提问by Gabriel

采纳答案by Yo_Chris

回答by firefly

回答by meW

回答by Jakob

回答by Gabriel

相关推荐

最近更新

标签

获取 Pandas 中具有特定值的单元格的行和列

提问by Gabriel

采纳答案by Yo_Chris

回答by firefly

回答by meW

回答by Jakob

回答by Gabriel

相关推荐

Pandas 使用行索引拆分数据帧

pandas 熊猫 read_html ValueError: 没有找到表格

pandas AttributeError：无法访问“DataFrameGroupBy”对象的可调用属性“groupby”

pandas 计算数据框列中真/假的出现次数

相关推荐

最近更新

标签