Python 检查熊猫中是否存在一行

Question

提问by Messit?zil

I want to check if a row exists in dataframe, following is my code:

我想检查数据框中是否存在一行，以下是我的代码：

df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Name','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 30957]
df1 = df1[['Format','Name','Number']]
df1 = df1.groupby(['Format','Name'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if 'entry' in df1:
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)

This is the output:

这是输出：

Name    Apr 2013  Apr 2014  Apr 2015  Apr 2016  Apr 2017  Aug 2010  Aug 2013  
Format                                                                         

entry          0         0         0         1         4         1         0   
pdf           13        12         4        23         7         1         9   
sum           13        12         4        24        11         2         9

Does if 'entry' in df2: only check if 'entry' exists as a column? It must be the case, I guess. We can see that the row 'entry' exists but we still land in the else condition(if it had landed in if the statement sum for Apr 2016 would be 23).

if df2: 中的“条目”是否仅检查“条目”是否作为列存在？一定是这样，我猜。我们可以看到行 'entry' 存在，但我们仍然处于 else 条件中（如果 2016 年 4 月的语句总和为 23，则它已进入）。

If I check it for the file which don't have the row 'entry', it again lands in else statement(as I expect), so I assume it always enters the else condition.

如果我检查没有“条目”行的文件，它会再次出现在 else 语句中（如我所料），所以我假设它总是进入 else 条件。

How do I check if a row exists in pandas?

如何检查熊猫中是否存在一行？

Answer 1

回答by jezrael

I think you need compare index values - output is Trueand Falsenumpy array. And for scalar need any- check at least one Trueor allfor check if all values are Trues:

我认为您需要比较索引值 - 输出是True和Falsenumpy 数组。对于标量需求any- 检查至少一个True或all检查所有值是否为Trues：

(df.index == 'entry').any()

(df.index == 'entry').all()

Another solution from comment of John Galt:

John Galt评论的另一个解决方案：

'entry' in df.index

If need check substring:

如果需要检查子字符串：

df.index.str.contains('en').any()

Sample:

样品：

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','pdf','sum'])
print(df)
       Apr 2013
entry         1
pdf           2
sum           3

print (df.index == 'entry')
[ True False False]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
False

#check columns values
print ('entry' in df)
False
#same as explicitely call columns (better readability)
print ('entry' in df.columns)
False
#check index values
print ('entry' in df.index)
True
#check columns values
print ('Apr 2013' in df)
True
#check columns values
print ('Apr 2013' in df.columns)
True

df = pd.DataFrame({'Apr 2013':[1,2,3]}, index=['entry','entry','entry'])
print(df)
       Apr 2013
entry         1
entry         2
entry         3

print (df.index == 'entry')
[ True  True  True]

print ((df.index == 'entry').any())
True
print ((df.index == 'entry').all())
True

Answer 2

回答by Yonatan Zax

Another way to check if a row/line exists in dataframe is using df.loc:

检查数据框中是否存在行/行的另一种方法是使用 df.loc：

subDataFrame = dataFrame.loc[dataFrame[columnName] == value]

This code checks every 'value' in a given line(separated by comma), return True/False if a line exists in the dataframe

此代码检查给定行中的每个“值”（以逗号分隔），如果数据框中存在一行，则返回 True/False

There is a short example using Stocks for the dataframe

有一个使用 Stocks 作为数据框的简短示例

# *****     Code for 'Check if a line exists in dataframe' using Pandas     *****

# Checks if value can be converted to a number
# Return: True/False
def isfloat(value):
  try:
    float(value)
    return True
  except:
    return False


# Example:
# list1 = ['D','C','B','A']
# list2 = ['OK','Good','82','Great']
# mergedList = [['D','OK'],['C','Good'],['B',82],['A','Great']
def getMergedListFromTwoLists(list1, list2):
    mergedList = []
    numOfColumns = min(len(list1), len(list2))
    for col in range(0, numOfColumns):
        val1 = list1[col]
        val2 = list2[col]

        # In the dataframe value stored as a number
        if isfloat(val2):
            val2 = float(val2)
        mergedList.append([val1, val2])

    return mergedList


# Returns only rows that have valuesAsArray[1] in the valuesAsArray[0]
# Example: valuesAsArray = ['Symbol','AAPL'], returns rows with 'AAPL'
def getSubDataFrame(dataFrame, valuesAsArray):
    subDataFrame = dataFrame.loc[dataFrame[valuesAsArray[0]] == valuesAsArray[1]]
    return subDataFrame




def createDataFrameAsExample():
    import pandas as pd
    data = {
        'MarketCenter': ['T', 'T', 'T', 'T'],
        'Symbol': ['AAPL', 'FB', 'AAPL', 'FB'],
        'Date': [20190101, 20190102, 20190201, 20190301],
        'Time': ['08:00:00', '08:00:00', '09:00:00', '09:00:00'],
        'ShortType': ['S', 'S', 'S', 'S'],
        'Size': [10, 10, 20, 30],
        'Price': [100, 100, 300, 200]
    }
    dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']
    df = pd.DataFrame(data, columns=dfHeadLineAsArray)
    return df



def adapterCheckIfLineExistsInDataFrame(originalDataFrame, headlineAsArray, line):
    dfHeadLineAsArray = headlineAsArray
    # Line example: 'T,AAPL,20190101,08:00:00,S,10,100'
    lineAsArray = line.split(',')

    valuesAsArray = getMergedListFromTwoLists(dfHeadLineAsArray, lineAsArray)
    return checkIfLineExistsInDataFrame(originalDataFrame, valuesAsArray)



def checkIfLineExistsInDataFrame(originalDataFrame,  valuesAsArray):

    if not originalDataFrame.empty:


        subDateFrame = originalDataFrame
        for value in valuesAsArray:
            if subDateFrame.empty:
                return False
            subDateFrame = getSubDataFrame(subDateFrame, value)

        if subDateFrame.empty:
            False
        else:
            return True
    return False


def testExample():
    dataFrame = createDataFrameAsExample()
    dfHeadLineAsArray = ['MarketCenter', 'Symbol', 'Date', 'Time', 'ShortType', 'Size','Price']

    # Three made up lines (not in df)
    lineToCheck1 = 'T,FB,20190102,13:00:00,S,10,100'
    lineToCheck2 = 'T,FB,20190102,08:00:00,S,60,100'
    lineToCheck3 = 'T,FB,20190102,08:00:00,S,10,150'

    # This line exists in the dataframe
    lineToCheck4 = 'T,FB,20190102,08:00:00,S,10,100'

    lineExists1 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck1)
    lineExists2 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck2)
    lineExists3 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck3)
    lineExists4 = adapterCheckIfLineExistsInDataFrame(dataFrame,dfHeadLineAsArray,lineToCheck4)

    expected = 'False False False True'
    print('Expected:',expected)
    print('Method:',lineExists1,lineExists2,lineExists3,lineExists4)



testExample()

Click to see the dataframe Dataframe from Example

单击以查看示例中的数据框 Dataframe

Python 检查熊猫中是否存在一行

提问by Messit?zil

回答by jezrael

回答by Yonatan Zax

相关推荐

最近更新

标签

Python 检查熊猫中是否存在一行

提问by Messit?zil

回答by jezrael

回答by Yonatan Zax

相关推荐

Python 激活 venv 时权限被拒绝

flask - 从 python 到 html 显示数据库

Python pdfminer - 导入错误：没有名为 pdfminer.pdfdocument 的模块

Python 从seaborn保存情节

相关推荐

最近更新

标签