Python 检查熊猫中是否存在一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45636382/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:11:09  来源:igfitidea点击:

Check if a row exists in pandas

pythonpandas

提问by Messit?zil

I want to check if a row exists in dataframe, following is my code:

我想检查数据框中是否存在一行,以下是我的代码:

df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Name','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 30957]
df1 = df1[['Format','Name','Number']]
df1 = df1.groupby(['Format','Name'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if 'entry' in df1:
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)

This is the output:

这是输出:

Name    Apr 2013  Apr 2014  Apr 2015  Apr 2016  Apr 2017  Aug 2010  Aug 2013  
Format                                                                         

entry          0         0         0         1         4         1         0   
pdf           13        12         4        23         7         1         9   
sum           13        12         4        24        11         2         9 

Does if 'entry' in df2: only check if 'entry' exists as a column? It must be the case, I guess. We can see that the row 'entry' exists but we still land in the else condition(if it had landed in if the statement sum for Apr 2016 would be 23).

if df2: 中的“条目”是否仅检查“条目”是否作为列存在?一定是这样,我猜。我们可以看到行 'entry' 存在,但我们仍然处于 else 条件中(如果 2016 年 4 月的语句总和为 23,则它已进入)。

If I check it for the file which don't have the row 'entry', it again lands in else statement(as I expect), so I assume it always enters the else condition.

如果我检查没有“条目”行的文件,它会再次出现在 else 语句中(如我所料),所以我假设它总是进入 else 条件。

How do I check if a row exists in pandas?

如何检查熊猫中是否存在一行?

回答by jezrael

I think you need compare index values - output is Trueand Falsenumpy array. And for scalar need any- check at least one Trueor allfor check if all values are Trues:

我认为您需要比较索引值 - 输出是TrueFalsenumpy 数组。对于标量需求any- 检查至少一个Trueall检查所有值是否为Trues:

(df.index == 'entry').any()

(df.index == 'entry').all()

Another solution from comment of John Galt:

John Galt评论的另一个解决方案:

'entry' in df.index

If need check substring:

如果需要检查子字符串:

df.index.str.contains('en').any()

Sample:

样品

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','pdf','sum'])
print(df)
       Apr 2013
entry         1
pdf           2
sum           3

print (df.index == 'entry')
[ True False False]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
False


#check columns values
print ('entry' in df)
False
#same as explicitely call columns (better readability)
print ('entry' in df.columns)
False
#check index values
print ('entry' in df.index)
True
#check columns values
print ('Apr 2013' in df)
True
#check columns values
print ('Apr 2013' in df.columns)
True

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','entry','entry'])
print(df)
       Apr 2013
entry         1
entry         2
entry         3

print (df.index == 'entry')
[ True  True  True]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
True

回答by Yonatan Zax

Another way to check if a row/line exists in dataframe is using df.loc:

检查数据框中是否存在行/行的另一种方法是使用 df.loc:

subDataFrame = dataFrame.loc[dataFrame[columnName] == value]

subDataFrame = dataFrame.loc[dataFrame[columnName] == value]

This code checks every 'value' in a given line(separated by comma), return True/False if a line exists in the dataframe

此代码检查给定行中的每个“值”(以逗号分隔),如果数据框中存在一行,则返回 True/False

There is a short example using Stocks for the dataframe

有一个使用 Stocks 作为数据框的简短示例

# *****     Code for 'Check if a line exists in dataframe' using Pandas     *****

# Checks if value can be converted to a number
# Return: True/False
def isfloat(value):
  try:
    float(value)
    return True
  except:
    return False


# Example:
# list1 = ['D','C','B','A']
# list2 = ['OK','Good','82','Great']
# mergedList = [['D','OK'],['C','Good'],['B',82],['A','Great']
def getMergedListFromTwoLists(list1, list2):
    mergedList = []
    numOfColumns = min(len(list1), len(list2))
    for col in range(0, numOfColumns):
        val1 = list1[col]
        val2 = list2[col]

        # In the dataframe value stored as a number
        if isfloat(val2):
            val2 = float(val2)
        mergedList.append([val1, val2])

    return mergedList


# Returns only rows that have valuesAsArray[1] in the valuesAsArray[0]
# Example: valuesAsArray = ['Symbol','AAPL'], returns rows with 'AAPL'
def getSubDataFrame(dataFrame, valuesAsArray):
    subDataFrame = dataFrame.loc[dataFrame[valuesAsArray[0]] == valuesAsArray[1]]
    return subDataFrame




def createDataFrameAsExample():
    import pandas as pd
    data = {
        'MarketCenter': ['T', 'T', 'T', 'T'],
        'Symbol': ['AAPL', 'FB', 'AAPL', 'FB'],
        'Date': [20190101, 20190102, 20190201, 20190301],
        'Time': ['08:00:00', '08:00:00', '09:00:00', '09:00:00'],
        'ShortType': ['S', 'S', 'S', 'S'],
        'Size': [10, 10, 20, 30],
        'Price': [100, 100, 300, 200]
    }
    dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']
    df = pd.DataFrame(data, columns=dfHeadLineAsArray)
    return df



def adapterCheckIfLineExistsInDataFrame(originalDataFrame, headlineAsArray, line):
    dfHeadLineAsArray = headlineAsArray
    # Line example: 'T,AAPL,20190101,08:00:00,S,10,100'
    lineAsArray = line.split(',')

    valuesAsArray = getMergedListFromTwoLists(dfHeadLineAsArray, lineAsArray)
    return checkIfLineExistsInDataFrame(originalDataFrame, valuesAsArray)



def checkIfLineExistsInDataFrame(originalDataFrame,  valuesAsArray):

    if not originalDataFrame.empty:


        subDateFrame = originalDataFrame
        for value in valuesAsArray:
            if subDateFrame.empty:
                return False
            subDateFrame = getSubDataFrame(subDateFrame, value)

        if subDateFrame.empty:
            False
        else:
            return True
    return False


def testExample():
    dataFrame = createDataFrameAsExample()
    dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']

    # Three made up lines (not in df)
    lineToCheck1 = 'T,FB,20190102,13:00:00,S,10,100'
    lineToCheck2 = 'T,FB,20190102,08:00:00,S,60,100'
    lineToCheck3 = 'T,FB,20190102,08:00:00,S,10,150'

    # This line exists in the dataframe
    lineToCheck4 = 'T,FB,20190102,08:00:00,S,10,100'

    lineExists1 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck1)
    lineExists2 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck2)
    lineExists3 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck3)
    lineExists4 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck4)

    expected = 'False False False True'
    print('Expected:',expected)
    print('Method:',lineExists1,lineExists2,lineExists3,lineExists4)



testExample()

Click to see the dataframe Dataframe from Example

单击以查看示例中的数据 框 Dataframe