Python 熊猫数据框按日期排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41433765/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:56:55  来源:igfitidea点击:

pandas dataframe sort by date

pythonsortingdatetimepandas

提问by ajax2000

I made a dataframe by importing a csv file. And converted the date column to datetime and made it the index. However, when sorting the index it doesn't produce the result I wanted

我通过导入一个 csv 文件制作了一个数据框。并将日期列转换为日期时间并使其成为索引。但是,在对索引进行排序时,它不会产生我想要的结果

print(df.head())
df['Date'] = pd.to_datetime(df['Date'])
df.index = df['Date']
del df['Date']
df.sort_index()
print(df.head())

Here's the result:

结果如下:

         Date     Last
0  2016-12-30  1.05550
1  2016-12-29  1.05275
2  2016-12-28  1.04610
3  2016-12-27  1.05015
4  2016-12-23  1.05005
               Last
Date               
2016-12-30  1.05550
2016-12-29  1.05275
2016-12-28  1.04610
2016-12-27  1.05015
2016-12-23  1.05005

The date actually goes back to 1999, so if I sort this by date, it should show the data in ascending order right?

日期实际上可以追溯到 1999 年,所以如果我按日期排序,它应该按升序显示数据对吗?

回答by Marjan Moderc

Just expanding MaxU's correct answer: you have used correct method, but, just as with many other pandas methods, you will have to "recreate" dataframe in order for desired changes to take effect. As the MaxU already suggested, this is achieved by typing the variable again (to "store" the output of the used method into the same variable), e.g.:

只是扩展 MaxU 的正确答案:您使用了正确的方法,但是,就像许多其他 Pandas 方法一样,您必须“重新创建”数据框以使所需的更改生效。正如 MaxU 已经建议的那样,这是通过再次键入变量来实现的(将所用方法的输出“存储”到同一变量中),例如:

df = df.sort_index()

df = df.sort_index()

or by harnessing the power of attribute inplace=True, which is going to replace the content of the variable without need of redeclaring it.

或者通过利用属性的力量inplace=True,它将替换变量的内容而无需重新声明它。

df.sort_index(inplace=True)

df.sort_index(inplace=True)

However, in my experience, I often feel "safer" using the first option. It also looks clearer and more normalized, since not all the methods offer the inplaceusage. But I all comes down to scripting sytle I guess...

但是,根据我的经验,我经常觉得使用第一个选项“更安全”。它看起来也更清晰、更规范,因为并非所有方法都提供这种inplace用法。但我都归结为脚本风格,我想......

回答by Ishan Khatri

The data looks like this

数据看起来像这样

Date,Last
2016-12-30,1.05550
2016-12-29,1.05275
2016-12-28,1.04610
2016-12-27,1.05015
2016-12-23,1.05005

Read the data by using pandas

使用pandas读取数据

import pandas as pd
df = pd.read_csv('data',sep=',')
# Displays the data head
print (df.head())
         Date     Last
0  2016-12-30  1.05550
1  2016-12-29  1.05275
2  2016-12-28  1.04610
3  2016-12-27  1.05015
4  2016-12-23  1.05005

# Sort column with name Date
df = df.sort_values(by = 'Date')
         Date     Last
4  2016-12-23  1.05005
3  2016-12-27  1.05015
2  2016-12-28  1.04610
1  2016-12-29  1.05275
0  2016-12-30  1.05550

# reset the index
df.reset_index(inplace=True)

# Display the data head after index reset
       index        Date     Last
0      4  2016-12-23  1.05005
1      3  2016-12-27  1.05015
2      2  2016-12-28  1.04610
3      1  2016-12-29  1.05275
4      0  2016-12-30  1.05550

# delete the index
del df['index']

# Display the data head
print (df.head())
         Date     Last
0  2016-12-23  1.05005
1  2016-12-27  1.05015
2  2016-12-28  1.04610
3  2016-12-29  1.05275
4  2016-12-30  1.05550