Python 在循环中创建多个数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30635145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:44:29  来源:igfitidea点击:

Create multiple dataframes in loop

pythonpandasdataframe

提问by Luis Ibá?ez Herrera

I have a list, with each entry being a company name

我有一个列表,每个条目都是一个公司名称

companies = ['AA', 'AAPL', 'BA', ....., 'YHOO']

I want to create a new dataframe for each entry in the list.

我想为列表中的每个条目创建一个新的数据框。

Something like

就像是

(pseudocode)

(伪代码)

for c in companies:
     c = pd.DataFrame()

I have searched for a way to do this but can't find it. Any ideas?

我一直在寻找一种方法来做到这一点,但找不到。有任何想法吗?

采纳答案by maxymoo

You can do this (although obviously use execwith extreme caution if this is going to be public-facing code)

你可以这样做(尽管exec如果这是面向公众的代码,显然使用时要格外小心)

for c in companies:
     exec('{} = pd.DataFrame()'.format(c))

回答by holdenweb

Just to underline my comment to @maxymoo's answer, it's almost invariably a bad idea ("code smell") to add names dynamically to a Python namespace. There are a number of reasons, the most salient being:

只是为了强调我对@maxymoo 的回答的评论,将名称动态添加到 Python 命名空间几乎总是一个坏主意(“代码味道”)。原因有很多,最突出的是:

  1. Created names might easily conflict with variables already used by your logic.

  2. Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.

  1. 创建的名称可能很容易与您的逻辑已使用的变量发生冲突。

  2. 由于名称是动态创建的,您通常最终也会使用动态技术来检索数据。

This is why dicts were included in the language. The correct way to proceed is:

这就是为什么 dicts 被包含在语言中的原因。正确的操作方法是:

d = {}
for name in companies:
    d[name] = pd.DataFrame()

Nowadays you can write a single dict comprehensionexpression to do the same thing, but some people find it less readable:

现在你可以编写一个单独的dict 理解表达式来做同样的事情,但有些人发现它不太可读:

d = {name: pd.DataFrame() for name in companies}

Once dis created the DataFramefor company xcan be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:

一旦d创建DataFrame了公司x可以为被检索d[x],这样你就可以查找特定的公司很容易。要对所有公司进行操作,您通常会使用如下循环:

for name, df in d.items():
    # operate on DataFrame 'df' for company 'name'

In Python 2 you are better writing

在 Python 2 中你写得更好

for name, df in d.iteritems():

because this avoids instantiating a list of (name, df)tuples.

因为这避免了实例化(name, df)元组列表。

回答by ak3191

Adding to the above great answers. The above will work flawless if you need to create empty data frames but if you need to create multiple dataframe based on some filtering:

添加到上述伟大的答案。如果您需要创建空数据框,但如果您需要基于某些过滤创建多个数据框,则上述内容将完美无缺:

Suppose the list you got is a column of some dataframe and you want to make multiple data frames for each unique companies fro the bigger data frame:-

假设您得到的列表是某个数据框的列,并且您想为更大的数据框为每个独特的公司制作多个数据框:-

  1. First take the unique names of the companies:-

    compuniquenames = df.company.unique()
    
  2. Create a data frame dictionary to store your data frames

    companydict = {elem : pd.DataFrame() for elem in compuniquenames}
    
  1. 首先取公司的唯一名称:-

    compuniquenames = df.company.unique()
    
  2. 创建一个数据框字典来存储你的数据框

    companydict = {elem : pd.DataFrame() for elem in compuniquenames}
    

The above two are already in the post:

上面两个已经在帖子里了:

for key in DataFrameDict.keys():
    DataFrameDict[key] = df[:][df.company == key]

The above will give you a data frame for all the unique companies with matching record.

以上将为您提供所有具有匹配记录的独特公司的数据框。