Python 为一列提供多个索引/标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32370402/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Giving a column multiple indexes/headers
提问by JC_CL
I am working with pandas dataframes that are essentially time series like this:
我正在使用基本上是这样的时间序列的熊猫数据帧:
level
Date
1976-01-01 409.67
1976-02-01 409.58
1976-03-01 409.66
…
What I want to have, is multiple indexes/headers for the level column, like so:
我想要的是 level 列的多个索引/标题,如下所示:
Station1 #Name of the datasource
43.1977317,-4.6473648,5 #Lat/Lon of the source
Precip #Type of data
Date
1976-01-01 409.67
1976-02-01 409.58
1976-03-01 409.66
…
So essentially I am searching for something like Mydata.columns.level1 = ['Station1']
, Mydata.columns.level2 = [Lat,Lon]
, Mydata.columns.level3 = ['Precip']
.
所以基本上我正在寻找类似Mydata.columns.level1 = ['Station1']
, Mydata.columns.level2 = [Lat,Lon]
, 的东西Mydata.columns.level3 = ['Precip']
。
Reason being that a single location can have multiple datasets, and that I want to be able to pick either all data from one location, or all data of a certain type from all locations, from a subsequent merged, big dataframe.
原因是一个位置可以有多个数据集,并且我希望能够从一个位置选择所有数据,或者从随后合并的大数据帧中选择所有位置的某种类型的所有数据。
I can set up an example dataframe from the pandas documentation, and test my selection, but with my real data, I need a different way to set the indexes as in the example.
我可以从 pandas 文档中设置一个示例数据框,并测试我的选择,但是对于我的真实数据,我需要一种不同的方法来设置示例中的索引。
Example:
例子:
Built a small dataframe
构建了一个小数据框
header = [np.array(['location','location','location','location2','location2','location2']),
np.array(['S1','S2','S3','S1','S2','S3'])]
df = pd.DataFrame(np.random.randn(5, 6), index=['a','b','c','d','e'], columns = header )
df
location location2
S1 S2 S3 S1 S2 S3
a -1.469932 -1.544511 -1.373463 -0.317262 0.024832 -0.641000
b 0.047170 -0.339423 1.351253 0.601172 -1.607339 0.035932
c -0.257479 1.140829 0.188291 -0.242490 1.019315 -1.163429
d 0.832949 0.098170 -0.818513 -0.070383 0.557419 -0.489839
e -0.628549 -0.158419 0.366167 -2.319316 -0.474897 -0.319549
Pick datatype or location:
选择数据类型或位置:
df.loc(axis=1)[:,'S1']
location location2
S1 S1
a -1.469932 -0.317262
b 0.047170 0.601172
c -0.257479 -0.242490
d 0.832949 -0.070383
e -0.628549 -2.319316
df['location']
S1 S2 S3
a -1.469932 -1.544511 -1.373463
b 0.047170 -0.339423 1.351253
c -0.257479 1.140829 0.188291
d 0.832949 0.098170 -0.818513
e -0.628549 -0.158419 0.366167
Or am I just looking for the wrong terminology? Because 90% of all examples in the documentation, and the questions here only treat the vertical "stuff" (dates or abcde in my case) as index, and a quick df.index.values
on my test data also just gets me the vertical array(['a', 'b', 'c', 'd', 'e'], dtype=object)
.
还是我只是在寻找错误的术语?因为文档中 90% 的示例以及此处的问题仅将垂直“内容”(在我的情况下为日期或 abcde)视为索引,并且快速df.index.values
查看我的测试数据也只会让我获得垂直array(['a', 'b', 'c', 'd', 'e'], dtype=object)
.
采纳答案by BKS
You can use multiIndex to give multiple columns with names for each level. Use MultiIndex.from_product()
to make multiIndex from cartesian products of multiple iterables.
您可以使用 multiIndex 为每个级别提供多个带有名称的列。用于MultiIndex.from_product()
从多个可迭代对象的笛卡尔积生成 multiIndex。
header = pd.MultiIndex.from_product([['location1','location2'],
['S1','S2','S3']],
names=['loc','S'])
df = pd.DataFrame(np.random.randn(5, 6),
index=['a','b','c','d','e'],
columns=header)
Two levels will be loc and S.
两个级别将是 loc 和 S。
df
loc location1 location2
S S1 S2 S3 S1 S2 S3
a -1.245988 0.858071 -1.433669 0.105300 -0.630531 -0.148113
b 1.132016 0.318813 0.949564 -0.349722 -0.904325 0.443206
c -0.017991 0.032925 0.274248 0.326454 -0.108982 0.567472
d 2.363533 -1.676141 0.562893 0.967338 -1.071719 -0.321113
e 1.921324 0.110705 0.023244 -0.432196 0.172972 -0.50368
Now you can use xs to slice the dateframe based on levels.
现在您可以使用 xs 根据级别对日期框进行切片。
df.xs('location1',level='loc',axis=1)
S S1 S2 S3
a -1.245988 0.858071 -1.433669
b 1.132016 0.318813 0.949564
c -0.017991 0.032925 0.274248
d 2.363533 -1.676141 0.562893
e 1.921324 0.110705 0.02324
df.xs('S1',level='S',axis=1)
loc location1 location2
a -1.245988 0.105300
b 1.132016 -0.349722
c -0.017991 0.326454
d 2.363533 0.967338
e 1.921324 -0.43219