获取 Pandas 中具有特定值的单元格的行和列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53856763/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get row and column in Pandas for a cell with a certain value
提问by Gabriel
I am trying to read an Excel spreadsheet that is unformatted using Pandas. There are multiple tables within a single sheet and I want to convert these tables into dataframes. Since it is not already "indexed" in the traditional way, there are no meaningful column or row indices. Is there a way to search for a specific value and get the row, column where that is? For example, say I want to get a row, column number for all cells that contain the string "Title".
我正在尝试使用 Pandas 读取未格式化的 Excel 电子表格。一张表中有多个表,我想将这些表转换为数据框。由于它尚未以传统方式“编入索引”,因此没有有意义的列或行索引。有没有办法搜索特定值并获取行、列所在的位置?例如,假设我想获取包含字符串“Title”的所有单元格的行列号。
I have already tried things like DataFrame.filter but that only works if there are row and column indices.
我已经尝试过 DataFrame.filter 之类的东西,但只有在有行和列索引时才有效。
采纳答案by Yo_Chris
You can do some long and hard to read list comprehension:
你可以做一些长而难读的列表理解:
# assume this df and that we are looking for 'abc'
df = pd.DataFrame({'col':['abc', 'def','wert','abc'], 'col2':['asdf', 'abc', 'sdfg', 'def']})
[(df[col][df[col].eq('abc')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].eq('abc')].index))]
out:
出去:
[(0, 0), (3, 0), (1, 1)]
I should note that this is (index value, column location)
我应该注意到这是(索引值,列位置)
you can also change .eq()
to str.contains()
if you are looking for any strings that contains a certain value:
如果您要查找包含特定值的任何字符串,也可以更改.eq()
为str.contains()
:
[(df[col][df[col].str.contains('ab')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].str.contains('ab')].index))]
回答by firefly
Create a df with NaN where your_value is not found.
Drop all rows that don't contain the value.
Drop all columns that don't contain the value
使用 NaN 创建一个 df ,其中没有找到 your_value 。
删除所有不包含该值的行。
删除所有不包含值的列
a = df.where(df=='your_value').dropna(how='all').dropna(axis=1)
To get the row(s)
获取行
a.index
To get the columns(s)
获取列
a.columns
回答by meW
Here's an example to fetch all the row and column index of the cells containing word 'title' -
这是一个获取包含单词“title”的单元格的所有行和列索引的示例 -
df = pd.DataFrame({'A':['here goes the title', 'tt', 'we have title here'],
'B': ['ty', 'title', 'complex']})
df
+---+---------------------+---------+
| | A | B |
+---+---------------------+---------+
| 0 | here goes the title | ty |
| 1 | tt | title |
| 2 | we have title here | complex |
+---+---------------------+---------+
idx = df.apply(lambda x: x.str.contains('title'))
col_idx = []
for i in range(df.shape[1]):
col_idx.append(df.iloc[:,i][idx.iloc[:,i]].index.tolist())
out = []
cnt = 0
for i in col_idx:
for j in range(len(i)):
out.append((i[j], cnt))
cnt += 1
out
# [(0, 0), (2, 0), (1, 1)] # Expected output
回答by Jakob
You can simply create a mask of the same shape than your df by calling df == 'title'
.
You can then combines this with the df.where()
method, which will set all fields to NA that are different to your keyword, and finally you can use dropna()
to reduce it to all valid fields. Then you can use the df.columnns
and df.index
like you're use to.
您可以通过调用df == 'title'
. 然后,您可以将其与df.where()
方法结合使用,该方法会将与您的关键字不同的所有字段设置为 NA,最后您可以使用dropna()
将其减少到所有有效字段。然后你可以使用df.columnns
和df.index
像你使用。
df = pd.DataFrame({"a": [0,1,2], "b": [0, 9, 7]})
print(df.where(df == 0).dropna().index)
print(df.where(df == 0).dropna().columns)
#Int64Index([0], dtype='int64')
#Index(['a', 'b'], dtype='object')
回答by Gabriel
Similar to what Chris said, I found this to work for me, although it's not the prettiest or shortest way. This returns all the row,column pairs matching a regular expression in a dataframe:
与 Chris 所说的类似,我发现这对我有用,尽管它不是最漂亮或最短的方法。这将返回与数据框中的正则表达式匹配的所有行、列对:
for row in df.itertuples():
col_count = 0
for col in row:
if regex.match(str(col)):
tuples.append((row_count, col_count))
col_count+=1
row_count+=1
return tuples