Python Pandas:仅从某些列创建新数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36518027/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:58:50  来源:igfitidea点击:

Pandas: Creating new data frame from only certain columns

pythoncsvpandas

提问by ValientProcess

I have a csv file with measurements, and I want to create a new csv file with the hourly averages and standard deviations. But only for certain columns.

我有一个带有测量值的 csv 文件,我想创建一个包含每小时平均值和标准偏差的新 csv 文件。但仅适用于某些列。

Example:

例子:

csv1:

csv1:

YY-MO-DD HH-MI-SS_SSS    |     Acceleration  |        Lumx     |    Pressure
2015-12-07 20:51:06:608  |        22.7       |        32.3     |     10
2015-12-07 20:51:07:609  |        22.5       |        47.7     |     15

to csv 2 (only for the pressure and acceleration:

到 csv 2(仅用于压力和加速度:

 YY-MO-DD HH-MI-SS_SSS       | Acceleration avg  |   Pressure avg
    2015-12-07 20:00:00:000  |        22.6       |        12.5     
    2015-12-07 21:00:00:000  |        ....       |        ....    

Now I have an idea (thanks to the people on this site) on how to calculate the averages - but i'm having trouble on creating a new smaller dataframe that contains the calculations for a few columns.

现在我有一个关于如何计算平均值的想法(感谢本网站上的人) - 但我在创建一个包含几列计算的新的较小数据框时遇到了麻烦。

Thanks !!!

谢谢 !!!

回答by su79eu7k

You should make smaller df like below,

你应该像下面那样制作更小的 df,

csv2 = csv1[['Acceleration', 'Pressure']].copy()

and can handle the csv2. (You said you have an idea about avg calculation) FYI, .copy()could be omitted if you are sure about view versus copy.

并且可以处理csv2。(你说你有一个关于 avg 计算的想法)仅供参考,.copy()如果你确定view 与 copy,可以省略。

回答by leerssej

csv2 = csv1.loc[:, ['Acceleration', 'Pressure']]
  • .loc[]helps keep the subsetting operation explicit and consistent.

  • .loc[]always returns a copy so the original dataframe is never modified.

  • .loc[]有助于保持子集操作的明确和一致。

  • .loc[]总是返回一个副本,所以原始数据帧永远不会被修改。

(for further discussion and great examples of the different view vs. copyalternatives please see: Pandas: Knowing when an operation affects the original dataframe)

(有关不同view vs. copy替代方案的进一步讨论和很好的示例,请参阅:Pandas:了解操作何时影响原始数据帧