Python 按日期对 Pandas 数据框进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28161356/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sort Pandas Dataframe by Date
提问by nicholas.reichel
I have a pandas dataframe as follows:
我有一个熊猫数据框如下:
Symbol Date
A 02/20/2015
A 01/15/2016
A 08/21/2015
I want to sort it by Date
, but the column is just an object
.
我想按 排序Date
,但该列只是一个object
.
I tried to make the column a date object, but I ran into an issue where that format is not the format needed. The format needed is 2015-02-20,
etc.
我试图将该列设为日期对象,但遇到了该格式不是所需格式的问题。需要的格式是2015-02-20,
等。
So now I'm trying to figure out how to have numpy convert the 'American' dates into the ISO standard, so that I can make them date objects, so that I can sort by them.
所以现在我想弄清楚如何让 numpy 将“美国”日期转换为 ISO 标准,以便我可以将它们设为日期对象,以便我可以对它们进行排序。
How would I convert these american dates into ISO standard, or is there a more straight forward method I'm missing within pandas?
我如何将这些美国日期转换为 ISO 标准,或者是否有我在熊猫中缺少的更直接的方法?
采纳答案by JAB
You can use pd.to_datetime()
to convert to a datetime object. It takes a format parameter, but in your case I don't think you need it.
您可以使用pd.to_datetime()
转换为日期时间对象。它需要一个格式参数,但在你的情况下,我认为你不需要它。
>>> import pandas as pd
>>> df = pd.DataFrame( {'Symbol':['A','A','A'] ,
'Date':['02/20/2015','01/15/2016','08/21/2015']})
>>> df
Date Symbol
0 02/20/2015 A
1 01/15/2016 A
2 08/21/2015 A
>>> df['Date'] =pd.to_datetime(df.Date)
>>> df.sort('Date') # This now sorts in date order
Date Symbol
0 2015-02-20 A
2 2015-08-21 A
1 2016-01-15 A
For future search, you can change the sort statement:
对于将来的搜索,您可以更改排序语句:
>>> df.sort_values(by='Date') # This now sorts in date order
Date Symbol
0 2015-02-20 A
2 2015-08-21 A
1 2016-01-15 A
回答by LondonRob
@JAB's answeris fast and concise. But it changes the DataFrame
you are trying to sort, which you may or may not want.
@JAB 的回答快速而简洁。但它改变了DataFrame
您尝试排序的内容,您可能想要也可能不想要。
(Note: You almost certainly willwant it, because your date columns should be dates, not strings!)
(注意:您几乎肯定会想要它,因为您的日期列应该是日期,而不是字符串!)
In the unlikely event that you don't want to change the dates into dates, you can also do it a different way.
万一您不想将日期更改为日期,您也可以采用不同的方式。
First, get the index from your sorted Date
column:
首先,从排序的Date
列中获取索引:
In [25]: pd.to_datetime(df.Date).order().index
Out[25]: Int64Index([0, 2, 1], dtype='int64')
Then use it to index your original DataFrame
, leaving it untouched:
然后用它来索引你的原件DataFrame
,保持原样:
In [26]: df.ix[pd.to_datetime(df.Date).order().index]
Out[26]:
Date Symbol
0 2015-02-20 A
2 2015-08-21 A
1 2016-01-15 A
Magic!
魔法!
Note:for Pandas versions 0.20.0 and later, use loc
instead of ix
, which is now deprecated.
注意:对于 Pandas 0.20.0 及更高版本,使用loc
代替ix
,现在已弃用。
回答by Reveille
sort
method has been deprecatedand replaced with sort_values
. After converting to datetime object using df['Date']=pd.to_datetime(df['Date'])
sort
方法已被弃用并替换为sort_values
. 使用转换为日期时间对象后df['Date']=pd.to_datetime(df['Date'])
df.sort_values(by=['Date'])
Note: to sort in-placeand/or in a descending order (the most recent first):
注意:就地和/或降序排序(最近的第一个):
df.sort_values(by=['Date'], inplace=True, ascending=False)
回答by Manthra
The data containing the date column can be read by using the below code:
可以使用以下代码读取包含日期列的数据:
data = pd.csv(file_path,parse_dates=[date_column])
Once the data is read by using the above line of code, the column containing the information about the date can be accessed using pd.date_time()
like:
使用上述代码行读取数据后,可以使用以下方法访问包含有关日期信息的列pd.date_time()
:
pd.date_time(data[date_column], format = '%d/%m/%y')
to change the format of date as per the requirement.
根据要求更改日期格式。