pandas 如何遍历多个数据帧以在 python 中的每个数据帧中选择一列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36601956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I iterate through multiple dataframes to select a column in each in python?
提问by DaithiOK
For my project I'm reading in a csv file with data from every State in the US. My function converts each of these into a separate Dataframe as I need to perform operations on each State's information.
对于我的项目,我正在读取一个 csv 文件,其中包含来自美国每个州的数据。我的函数将其中的每一个转换为单独的数据帧,因为我需要对每个州的信息执行操作。
def RanktoDF(csvFile):
df = pd.read_csv(csvFile)
df = df[pd.notnull(df['Index'])] # drop all null values
df = df[df.Index != 'Index'] #Drop all extra headers
df= df.set_index('State') #Set State as index
return df
I apply this function to every one of my files and return the df with a name from my array varNames
我将此函数应用于我的每个文件,并从我的数组 varNames 中返回带有名称的 df
for name , s in zip (glob.glob('*.csv'), varNames):
vars()["Crime" + s] = RanktoDF(name)
All of that works perfectly. My problem is that I also want to create a Dataframe thats made up of one column from each of those State Dataframes.
所有这些都完美无缺。我的问题是我还想创建一个数据框,它由来自每个状态数据框的一列组成。
I have tried iterating through a list of my dataframes and selecting the column (population) i want to append it to a new Dataframe:
我尝试遍历我的数据帧列表并选择我想将其附加到新数据帧的列(人口):
dfNewIndex = pd.DataFrame(index=CrimeRank_1980_df.index) # Create new DF with Index
for name in dfList: #dfList is my list of dataframes. See image
newIndex = name['Population']
dfNewIndex.append(newIndex)
#dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)
My error is always the same which tells me that name is viewed as a string rather than an actual Dataframe
我的错误总是相同的,它告诉我 name 被视为字符串而不是实际的 Dataframe
TypeError Traceback (most recent call last)
<ipython-input-30-5aa85b0174df> in <module>()
3
4 for name in dfList:
----> 5 newIndex = name['Index']
6 dfNewIndex.append(newIndex)
7 # dfNewIndex = pd.concat([dfNewIndex, dfList[name['Population']], axis=1)
TypeError: string indices must be integers
I understand that my list is a list of Strings rather than variables/dataframes so my question is how can i correct my code to be able to do what i want or is there an easier way of doing this?
我知道我的列表是一个字符串列表而不是变量/数据框,所以我的问题是我如何更正我的代码以能够做我想做的事情,或者有更简单的方法吗?
Any solutions I've looked up have given answers where the dataframes are explicitly typed in order to be concatenated but I have 50 so its a little unfeasible. Any help would be appreciated.
我查过的任何解决方案都给出了明确输入数据帧以便连接的答案,但我有 50 个,所以它有点不可行。任何帮助,将不胜感激。
回答by James Elderfield
One way would be to index into vars(), e.g.
一种方法是索引到 vars(),例如
for name in dfList:
newIndex = vars()[name]["Population"]
Alternatively I think it would be neater to store your dataframes in a container and iterate through that, e.g.
或者,我认为将数据帧存储在容器中并迭代它会更整洁,例如
frames = {}
for name, s in zip(glob.glob('*.csv'), varNames):
frames["Crime" + s] = RanktoDF(name)
for name in frames:
newIndex = frames[name]["Population"]