pandas python 应用函数列出并返回数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29072706/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python apply function to list and return data frame
提问by user2854008
I am new to python. I wrote a function that returns a pandas data frame. I am trying to apply this function to a list and I would like to merge all the results to one data frame. For example, if my function looks like:
我是python的新手。我编写了一个返回Pandas数据框的函数。我正在尝试将此函数应用于列表,并且我想将所有结果合并到一个数据框中。例如,如果我的函数如下所示:
def test(x):
return pd.DataFrame({'a':[x],'b':['test']})
I want to apply it to list [1,2,3,4,5], and get the result as a data frame which looks like:
我想将它应用于 list [1,2,3,4,5],并将结果作为数据框获取,如下所示:
a b
1 test
2 test
3 test
4 test
5 test
If I do [test(x) for x in [1,2,3,4,5]], it returns a weird list. Anyone could help me with this please? Thanks!
如果我这样做 [test(x) for x in [1,2,3,4,5]],它会返回一个奇怪的列表。任何人都可以帮我解决这个问题吗?谢谢!
PS: the function I am actually using:
PS:我实际使用的功能:
def cumRet(startDate,endDate=None,symbols=None,inDir=None):
if endDate is None:
endDate=startDate
if inDir is None:
inDir='E:\python\data\mktData\'
dates=dateRange(startDate,endDate)
if symbols is None:
adjClose=pd.merge(mktData_R(dates.iloc[0].strftime('%Y-%m-%d'),var=['adjClose'])
,mktData_R(dates.iloc[-1].strftime('%Y-%m-%d'),var=['adjClose'])
,on='symbol'
,how='outer')
else:
adjClose=pd.merge(mktData_R(dates.iloc[0].strftime('%Y-%m-%d'),symbols=symbols,var=['adjClose'])
,mktData_R(dates.iloc[-1].strftime('%Y-%m-%d'),symbols=symbols,var=['adjClose'])
,on='symbol'
,how='outer')
adjClose['adjClose_x'][pd.isnull(adjClose['adjClose_x'])]=1
adjClose['adjClose_y'][pd.isnull(adjClose['adjClose_y'])]=1
adjClose['cumRet']=adjClose['adjClose_y']/adjClose['adjClose_x']-1
return adjClose[['symbol','cumRet']]
回答by EdChum
your original code produced this:
你的原始代码产生了这个:
In [49]:
t = [1,2,3,4,5]
def test(x):
return pd.DataFrame({'a':[x],'b':['test']})
[test(t) for x in [1,2,3,4,5]]
Out[49]:
[ a b
0 [1, 2, 3, 4, 5] test, a b
0 [1, 2, 3, 4, 5] test, a b
0 [1, 2, 3, 4, 5] test, a b
0 [1, 2, 3, 4, 5] test, a b
0 [1, 2, 3, 4, 5] test]
Which is not what you intended as you're performing a list comprehension which will loop over each element and produce a list containing 5 dfs which themselves contain your element values as a list for column a.
这不是您想要的,因为您正在执行列表理解,它将遍历每个元素并生成一个包含 5 个 dfs 的列表,这些 dfs 本身包含您的元素值作为 a 列的列表。
You can avoid all this by just passing the list as an arg to the DataFrame constructor the values need to be list-like but as your arg is already a list you don't need to wrap it in another list, additionally for the bcolumn the length of the values have to match the length of the acolumn so you need to repeat the value by the length of the list:
您可以通过将列表作为 arg 传递给 DataFrame 构造函数来避免这一切,这些值需要类似于列表,但由于您的 arg 已经是一个列表,您不需要将它包装在另一个列表中,另外对于b列值的长度必须与a列的长度匹配,因此您需要按列表的长度重复该值:
In [4]:
t = [1,2,3,4,5]
def test(x):
return pd.DataFrame({'a':x,'b':['test']* len(x)})
test(t)
Out[4]:
a b
0 1 test
1 2 test
2 3 test
3 4 test
4 5 test
回答by igavriil
In your approach you are creating five dataframes not one.
在您的方法中,您正在创建五个数据框而不是一个。
You can do this without creating a list of size of your list with 'test'strings(as suggested by @EdChum) :
您可以在不使用'test'字符串创建列表大小的情况下执行此操作(如@EdChum 所建议):
l = [1,2,3,4,5]
def test(x):
return pd.DataFrame({'a':x, 'b':'test'})
test(l)
>>> a b
0 1 test
1 2 test
2 3 test
3 4 test
4 5 test

