pandas 按 SFrame 列记录值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27013398/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Log values by SFrame column
提问by Guforu
Please, can anybody tell me, how I can take logarithm from every value in SFrame, graphlab (or DataFrame, pandas) column, without to iterate through the whole length of the SFrame column? I specially interest on similar functionality, like by Groupby Aggregatorsfor the log-function. Couldn't find it someself...
请有人告诉我,我如何从SFrame、graphlab(或DataFrame、pandas)列中的每个值取对数,而无需遍历 SFrame 列的整个长度?我对类似的功能特别感兴趣,比如Groupby Aggregators的日志功能。自己找不到...
Important:Please, I don't interest for the for-loopiteration for the whole length of the column. I only interest for specific function, which transform allvalues to the log-values for the whole column.
重要提示:拜托,我for-loop对整个列长度的迭代不感兴趣。我只对特定函数感兴趣,它将所有值转换为整列的对数值。
I'm also very sorry, if this function is in the manual. Please, just give me a link...
我也很抱歉,如果这个功能在手册中。请给我一个链接...
采纳答案by cel
numpyprovides implementations for a wide number of basic mathematical transformations. You can use those on all data structures that build on numpy's ndarray.
numpy提供了大量基本数学变换的实现。您可以在所有基于 numpy 的ndarray.
import pandas as pd
import numpy as np
data = pd.Series([np.exp(1), np.exp(2), np.exp(3)])
np.log(data)
Outputs:
输出:
0 1
1 2
2 3
dtype: float64
This example is for pandasdata types, but it works for all data structures that are based on numpyarrays.
此示例适用于pandas数据类型,但适用于所有基于numpy数组的数据结构。
回答by papayawarrior
The same "apply" pattern works for SFrames as well. You could do:
同样的“应用”模式也适用于 SFrame。你可以这样做:
import graphlab
import math
sf = graphlab.SFrame({'a': [1, 2, 3]})
sf['b'] = sf['a'].apply(lambda x: math.log(x))
回答by Guforu
@cel
@cel
I think, in my case it could be possible also to use next pattern.
我认为,就我而言,也可以使用下一个模式。
import numpy
import pandas
import graphlab
df
a b c
1 1 1
1 2 3
2 1 3
....
df['log c'] = df.groupby('a')['c'].apply(lambda x: numpy.log(x))
for SFrame (sfinstead dfobject) it could look little be different
对于 SFrame(sf而不是df对象),它看起来可能没什么不同
logvals = numpy.log(sf['c'])
log_sf = graphlab.SFrame(logvals)
sf = sf.join(log_sf, how = 'outer')
Probably with numpythe code fragment is a little bit to long, but it works...
可能numpy代码片段有点长,但它有效......
The main problem is of course time perfomance. I did hope, I can fnd some specific function to minimise my time....
主要问题当然是时间性能。我确实希望,我可以找到一些特定的功能来最大限度地减少我的时间......

