Python 检查熊猫中是否存在一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45636382/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Check if a row exists in pandas
提问by Messit?zil
I want to check if a row exists in dataframe, following is my code:
我想检查数据框中是否存在一行,以下是我的代码:
df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Name','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 30957]
df1 = df1[['Format','Name','Number']]
df1 = df1.groupby(['Format','Name'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if 'entry' in df1:
df2 = df1[1:4].sum(axis=0)
else:
df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
This is the output:
这是输出:
Name Apr 2013 Apr 2014 Apr 2015 Apr 2016 Apr 2017 Aug 2010 Aug 2013
Format
entry 0 0 0 1 4 1 0
pdf 13 12 4 23 7 1 9
sum 13 12 4 24 11 2 9
Does if 'entry' in df2: only check if 'entry' exists as a column? It must be the case, I guess. We can see that the row 'entry' exists but we still land in the else condition(if it had landed in if the statement sum for Apr 2016 would be 23).
if df2: 中的“条目”是否仅检查“条目”是否作为列存在?一定是这样,我猜。我们可以看到行 'entry' 存在,但我们仍然处于 else 条件中(如果 2016 年 4 月的语句总和为 23,则它已进入)。
If I check it for the file which don't have the row 'entry', it again lands in else statement(as I expect), so I assume it always enters the else condition.
如果我检查没有“条目”行的文件,它会再次出现在 else 语句中(如我所料),所以我假设它总是进入 else 条件。
How do I check if a row exists in pandas?
如何检查熊猫中是否存在一行?
回答by jezrael
I think you need compare index values - output is True
and False
numpy array.
And for scalar need any
- check at least one True
or all
for check if all values are True
s:
我认为您需要比较索引值 - 输出是True
和False
numpy 数组。对于标量需求any
- 检查至少一个True
或all
检查所有值是否为True
s:
(df.index == 'entry').any()
(df.index == 'entry').all()
Another solution from comment of John Galt:
John Galt评论的另一个解决方案:
'entry' in df.index
If need check substring:
如果需要检查子字符串:
df.index.str.contains('en').any()
Sample:
样品:
df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','pdf','sum'])
print(df)
Apr 2013
entry 1
pdf 2
sum 3
print (df.index == 'entry')
[ True False False]
print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
False
#check columns values
print ('entry' in df)
False
#same as explicitely call columns (better readability)
print ('entry' in df.columns)
False
#check index values
print ('entry' in df.index)
True
#check columns values
print ('Apr 2013' in df)
True
#check columns values
print ('Apr 2013' in df.columns)
True
df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','entry','entry'])
print(df)
Apr 2013
entry 1
entry 2
entry 3
print (df.index == 'entry')
[ True True True]
print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
True
回答by Yonatan Zax
Another way to check if a row/line exists in dataframe is using df.loc:
检查数据框中是否存在行/行的另一种方法是使用 df.loc:
subDataFrame = dataFrame.loc[dataFrame[columnName] == value]
subDataFrame = dataFrame.loc[dataFrame[columnName] == value]
This code checks every 'value' in a given line(separated by comma), return True/False if a line exists in the dataframe
此代码检查给定行中的每个“值”(以逗号分隔),如果数据框中存在一行,则返回 True/False
There is a short example using Stocks for the dataframe
有一个使用 Stocks 作为数据框的简短示例
# ***** Code for 'Check if a line exists in dataframe' using Pandas *****
# Checks if value can be converted to a number
# Return: True/False
def isfloat(value):
try:
float(value)
return True
except:
return False
# Example:
# list1 = ['D','C','B','A']
# list2 = ['OK','Good','82','Great']
# mergedList = [['D','OK'],['C','Good'],['B',82],['A','Great']
def getMergedListFromTwoLists(list1, list2):
mergedList = []
numOfColumns = min(len(list1), len(list2))
for col in range(0, numOfColumns):
val1 = list1[col]
val2 = list2[col]
# In the dataframe value stored as a number
if isfloat(val2):
val2 = float(val2)
mergedList.append([val1, val2])
return mergedList
# Returns only rows that have valuesAsArray[1] in the valuesAsArray[0]
# Example: valuesAsArray = ['Symbol','AAPL'], returns rows with 'AAPL'
def getSubDataFrame(dataFrame, valuesAsArray):
subDataFrame = dataFrame.loc[dataFrame[valuesAsArray[0]] == valuesAsArray[1]]
return subDataFrame
def createDataFrameAsExample():
import pandas as pd
data = {
'MarketCenter': ['T', 'T', 'T', 'T'],
'Symbol': ['AAPL', 'FB', 'AAPL', 'FB'],
'Date': [20190101, 20190102, 20190201, 20190301],
'Time': ['08:00:00', '08:00:00', '09:00:00', '09:00:00'],
'ShortType': ['S', 'S', 'S', 'S'],
'Size': [10, 10, 20, 30],
'Price': [100, 100, 300, 200]
}
dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']
df = pd.DataFrame(data, columns=dfHeadLineAsArray)
return df
def adapterCheckIfLineExistsInDataFrame(originalDataFrame, headlineAsArray, line):
dfHeadLineAsArray = headlineAsArray
# Line example: 'T,AAPL,20190101,08:00:00,S,10,100'
lineAsArray = line.split(',')
valuesAsArray = getMergedListFromTwoLists(dfHeadLineAsArray, lineAsArray)
return checkIfLineExistsInDataFrame(originalDataFrame, valuesAsArray)
def checkIfLineExistsInDataFrame(originalDataFrame, valuesAsArray):
if not originalDataFrame.empty:
subDateFrame = originalDataFrame
for value in valuesAsArray:
if subDateFrame.empty:
return False
subDateFrame = getSubDataFrame(subDateFrame, value)
if subDateFrame.empty:
False
else:
return True
return False
def testExample():
dataFrame = createDataFrameAsExample()
dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']
# Three made up lines (not in df)
lineToCheck1 = 'T,FB,20190102,13:00:00,S,10,100'
lineToCheck2 = 'T,FB,20190102,08:00:00,S,60,100'
lineToCheck3 = 'T,FB,20190102,08:00:00,S,10,150'
# This line exists in the dataframe
lineToCheck4 = 'T,FB,20190102,08:00:00,S,10,100'
lineExists1 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck1)
lineExists2 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck2)
lineExists3 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck3)
lineExists4 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck4)
expected = 'False False False True'
print('Expected:',expected)
print('Method:',lineExists1,lineExists2,lineExists3,lineExists4)
testExample()
Click to see the dataframe Dataframe from Example
单击以查看示例中的数据 框 Dataframe