Python Pandas:仅旋转 DataFrame 中的某些列,同时保留其他列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36019788/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: pivot only certain columns in the DataFrame while keeping others
提问by naja
I am trying to re-arrange a DataFrame that I automatically read in from a json using Pandas. I've searched but have had no success.
我正在尝试重新排列我使用 Pandas 从 json 自动读取的 DataFrame。我已经搜索过,但没有成功。
I have the following json (saved as a string for copy/paste convenience) with a bunch of json objects/dictionarys under the tag 'value'
我有以下 json(为了复制/粘贴方便而保存为字符串),在标签“值”下有一堆 json 对象/字典
json_str = '''{"preferred_timestamp": "internal_timestamp",
"internal_timestamp": 3606765503.684,
"stream_name": "ctdpf_j_cspp_instrument",
"values": [{
"value_id": "temperature",
"value": 9.8319
}, {
"value_id": "conductivity",
"value": 3.58847
}, {
"value_id": "pressure",
"value": 22.963
}]
}'''
I use the function 'json_normalize' in order to load the json into a flattened Pandas dataframe.
我使用函数“json_normalize”来将 json 加载到扁平的 Pandas 数据帧中。
>>> from pandas.io.json import json_normalize
>>> import simplejson as json
>>> df = json_normalize(json.loads(json_str), 'values', ['preferred_timestamp', 'stream_name', 'internal_timestamp'])
>>> df
value value_id preferred_timestamp internal_timestamp \
0 9.83190 temperature internal_timestamp 3.606766e+09
1 3.58847 conductivity internal_timestamp 3.606766e+09
2 22.96300 pressure internal_timestamp 3.606766e+09
3 32.89470 salinity internal_timestamp 3.606766e+09
stream_name
0 ctdpf_j_cspp_instrument
1 ctdpf_j_cspp_instrument
2 ctdpf_j_cspp_instrument
3 ctdpf_j_cspp_instrument
Here is where I am stuck. I want to take the value and value_id columns and pivot these into new columns based off of value_id.
这是我被困的地方。我想获取 value 和 value_id 列,并将它们转换为基于 value_id 的新列。
I want the dataframe to look like the following:
我希望数据框如下所示:
stream_name preferred_timestamp internal_timestamp conductivity pressure salinity temperature
ctdpf_j_cspp_instrument internal_timestamp 3.606766e+09 3.58847 22.96300 32.89470 9.83190
I've tried both the pivot and pivot_table Pandas functions and even tried to manually pivot the tables by using 'set_index' and 'stack' but it's not quite how I want it.
我已经尝试了pivot 和pivot_table Pandas 函数,甚至尝试使用“set_index”和“stack”手动旋转表,但这并不是我想要的。
>>> df.pivot_table(values='value', index=['stream_name', 'preferred_timestamp', 'internal_timestamp', 'value_id'])
stream_name preferred_timestamp internal_timestamp value_id
ctdpf_j_cspp_instrument internal_timestamp 3.606766e+09 conductivity 3.58847
pressure 22.96300
salinity 32.89470
temperature 9.83190
Name: value, dtype: float64
This is close, but it didn't seem to pivot the values in 'value_id' into separate columns.
这很接近,但它似乎没有将“value_id”中的值转换为单独的列。
and
和
>>> df.pivot('stream_name', 'value_id', 'value')
value_id conductivity pressure salinity temperature
stream_name
ctdpf_j_cspp_instrument 3.58847 22.963 32.8947 9.8319
Close again, but it lacks the other columns that I want to be associated with this line.
再次关闭,但它缺少我想与此行关联的其他列。
I'm stuck here. Is there an elegant way of doing this or should I split the DataFrames and re-merge them to how I want?
我被困在这里。有没有一种优雅的方法来做到这一点,或者我应该拆分 DataFrames 并将它们重新合并到我想要的方式?
采纳答案by root
Your first attempt was nearly correct, just use columns='value_id'
instead of including it in the index.
您的第一次尝试几乎是正确的,只需使用columns='value_id'
而不是将其包含在索引中。
# Perform the pivot.
df = df.pivot_table(
values='value',
index=['stream_name', 'preferred_timestamp', 'internal_timestamp'],
columns='value_id'
)
# Formatting.
df.reset_index(inplace=True)
df.columns.name = None
This isn't an issue in your example data, but keep in mind that pivot_table
will aggregate values if multiple values are pivoted to the same position (taking the mean by default).
这在您的示例数据中不是问题,但请记住,pivot_table
如果将多个值旋转到同一位置(默认取平均值),则会聚合值。