Python Pandas：仅旋转 DataFrame 中的某些列，同时保留其他列

Question

提问by naja

I am trying to re-arrange a DataFrame that I automatically read in from a json using Pandas. I've searched but have had no success.

我正在尝试重新排列我使用 Pandas 从 json 自动读取的 DataFrame。我已经搜索过，但没有成功。

I have the following json (saved as a string for copy/paste convenience) with a bunch of json objects/dictionarys under the tag 'value'

我有以下 json（为了复制/粘贴方便而保存为字符串），在标签“值”下有一堆 json 对象/字典

json_str = '''{"preferred_timestamp": "internal_timestamp",
    "internal_timestamp": 3606765503.684,
    "stream_name": "ctdpf_j_cspp_instrument",
    "values": [{
        "value_id": "temperature",
        "value": 9.8319
    }, {
        "value_id": "conductivity",
        "value": 3.58847
    }, {
        "value_id": "pressure",
        "value": 22.963
    }]
}'''

I use the function 'json_normalize' in order to load the json into a flattened Pandas dataframe.

我使用函数“json_normalize”来将 json 加载到扁平的 Pandas 数据帧中。

>>> from pandas.io.json import json_normalize
>>> import simplejson as json
>>> df = json_normalize(json.loads(json_str), 'values', ['preferred_timestamp', 'stream_name', 'internal_timestamp'])
>>> df
      value      value_id preferred_timestamp  internal_timestamp  \
0   9.83190   temperature  internal_timestamp        3.606766e+09   
1   3.58847  conductivity  internal_timestamp        3.606766e+09   
2  22.96300      pressure  internal_timestamp        3.606766e+09   
3  32.89470      salinity  internal_timestamp        3.606766e+09   

               stream_name  
0  ctdpf_j_cspp_instrument  
1  ctdpf_j_cspp_instrument  
2  ctdpf_j_cspp_instrument  
3  ctdpf_j_cspp_instrument

Here is where I am stuck. I want to take the value and value_id columns and pivot these into new columns based off of value_id.

这是我被困的地方。我想获取 value 和 value_id 列，并将它们转换为基于 value_id 的新列。

I want the dataframe to look like the following:

我希望数据框如下所示：

stream_name              preferred_timestamp  internal_timestamp  conductivity  pressure  salinity  temperature    
ctdpf_j_cspp_instrument  internal_timestamp   3.606766e+09        3.58847       22.96300  32.89470  9.83190

I've tried both the pivot and pivot_table Pandas functions and even tried to manually pivot the tables by using 'set_index' and 'stack' but it's not quite how I want it.

我已经尝试了pivot 和pivot_table Pandas 函数，甚至尝试使用“set_index”和“stack”手动旋转表，但这并不是我想要的。

>>> df.pivot_table(values='value', index=['stream_name', 'preferred_timestamp', 'internal_timestamp', 'value_id'])
stream_name              preferred_timestamp  internal_timestamp  value_id    
ctdpf_j_cspp_instrument  internal_timestamp   3.606766e+09        conductivity     3.58847
                                                                  pressure        22.96300
                                                                  salinity        32.89470
                                                                  temperature      9.83190
Name: value, dtype: float64

This is close, but it didn't seem to pivot the values in 'value_id' into separate columns.

这很接近，但它似乎没有将“value_id”中的值转换为单独的列。

and

和

>>> df.pivot('stream_name', 'value_id', 'value')
value_id                 conductivity  pressure  salinity  temperature
stream_name                                                           
ctdpf_j_cspp_instrument       3.58847    22.963   32.8947       9.8319

Close again, but it lacks the other columns that I want to be associated with this line.

再次关闭，但它缺少我想与此行关联的其他列。

I'm stuck here. Is there an elegant way of doing this or should I split the DataFrames and re-merge them to how I want?

我被困在这里。有没有一种优雅的方法来做到这一点，或者我应该拆分 DataFrames 并将它们重新合并到我想要的方式？

Answer 1

采纳答案by root

Your first attempt was nearly correct, just use columns='value_id'instead of including it in the index.

您的第一次尝试几乎是正确的，只需使用columns='value_id'而不是将其包含在索引中。

# Perform the pivot.
df = df.pivot_table(
    values='value',
    index=['stream_name', 'preferred_timestamp', 'internal_timestamp'],
    columns='value_id'
    )

# Formatting.
df.reset_index(inplace=True)
df.columns.name = None

This isn't an issue in your example data, but keep in mind that pivot_tablewill aggregate values if multiple values are pivoted to the same position (taking the mean by default).

这在您的示例数据中不是问题，但请记住，pivot_table如果将多个值旋转到同一位置（默认取平均值），则会聚合值。

Python Pandas：仅旋转 DataFrame 中的某些列，同时保留其他列

提问by naja

采纳答案by root

相关推荐

最近更新

标签

Python Pandas：仅旋转 DataFrame 中的某些列，同时保留其他列

提问by naja

采纳答案by root

相关推荐

pandas 使用列表理解修改数据框列

Pandas DataFrame 能否高效计算 PMI（Pointwise Mutual Information）？

使用 pandas iterrows() 时追加新行？

使用 NaN 向下舍入 Pandas 数据框列中的值

相关推荐

最近更新

标签