pandas 将字符串转换为字典,然后访问键:值???如何访问 Python 的 <class 'dict'> 中的数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39169718/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:54:08  来源:igfitidea点击:

Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python?

pythonpandasdictionarydata-manipulationdask

提问by Linwoodc3

I am having issues accessing data inside a dictionary.

我在访问字典中的数据时遇到问题。

Sys: Macbook 2012
Python: Python 3.5.1 :: Continuum Analytics, Inc.

系统:Macbook 2012
Python:Python 3.5.1 :: Continuum Analytics, Inc.

I am working with a dask.dataframecreated from a csv.

我正在使用从 csv 创建的dask.dataframe

Edit Question

编辑问题

How I got to this point

我是如何走到这一步的

Assume I start out with a Pandas Series:

假设我从 Pandas 系列开始:

df.Coordinates
130      {u'type': u'Point', u'coordinates': [-43.30175...
278      {u'type': u'Point', u'coordinates': [-51.17913...
425      {u'type': u'Point', u'coordinates': [-43.17986...
440      {u'type': u'Point', u'coordinates': [-51.16376...
877      {u'type': u'Point', u'coordinates': [-43.17986...
1313     {u'type': u'Point', u'coordinates': [-49.72688...
1734     {u'type': u'Point', u'coordinates': [-43.57405...
1817     {u'type': u'Point', u'coordinates': [-43.77649...
1835     {u'type': u'Point', u'coordinates': [-43.17132...
2739     {u'type': u'Point', u'coordinates': [-43.19583...
2915     {u'type': u'Point', u'coordinates': [-43.17986...
3035     {u'type': u'Point', u'coordinates': [-51.01583...
3097     {u'type': u'Point', u'coordinates': [-43.17891...
3974     {u'type': u'Point', u'coordinates': [-8.633880...
3983     {u'type': u'Point', u'coordinates': [-46.64960...
4424     {u'type': u'Point', u'coordinates': [-43.17986...

The problem is, this is not a true dataframe of dictionaries. Instead, it's a column full of strings that LOOK like dictionaries. Running this show it:

问题是,这不是字典的真正数据框。相反,它是一列充满看起来像字典的字符串。运行这个显示它:

df.Coordinates.apply(type)
130      <class 'str'>
278      <class 'str'>
425      <class 'str'>
440      <class 'str'>
877      <class 'str'>
1313     <class 'str'>
1734     <class 'str'>
1817     <class 'str'>
1835     <class 'str'>
2739     <class 'str'>
2915     <class 'str'>
3035     <class 'str'>
3097     <class 'str'>
3974     <class 'str'>
3983     <class 'str'>
4424     <class 'str'>

My Goal: Access the coordinateskey and value in the dictionary. That's it. But it's a str

我的目标:访问coordinates字典中的键和值。就是这样。但这是一个str

I converted the strings to dictionaries using eval.

我使用eval.

new = df.Coordinates.apply(eval)
130      {'coordinates': [-43.301755, -22.990065], 'typ...
278      {'coordinates': [-51.17913026, -30.01201896], ...
425      {'coordinates': [-43.17986794, -22.91000096], ...
440      {'coordinates': [-51.16376782, -29.95488677], ...
877      {'coordinates': [-43.17986794, -22.91000096], ...
1313     {'coordinates': [-49.72688407, -29.33757253], ...
1734     {'coordinates': [-43.574057, -22.928059], 'typ...
1817     {'coordinates': [-43.77649254, -22.86940539], ...
1835     {'coordinates': [-43.17132318, -22.90895217], ...
2739     {'coordinates': [-43.1958313, -22.98755333], '...
2915     {'coordinates': [-43.17986794, -22.91000096], ...
3035     {'coordinates': [-51.01583481, -29.63593292], ...
3097     {'coordinates': [-43.17891379, -22.96476163], ...
3974     {'coordinates': [-8.63388008, 41.14594453], 't...
3983     {'coordinates': [-46.64960938, -23.55902666], ...
4424     {'coordinates': [-43.17986794, -22.91000096], ...

Next I text the type of object and get:

接下来我输入对象的类型并得到:

130      <class 'dict'>
278      <class 'dict'>
425      <class 'dict'>
440      <class 'dict'>
877      <class 'dict'>
1313     <class 'dict'>
1734     <class 'dict'>
1817     <class 'dict'>
1835     <class 'dict'>
2739     <class 'dict'>
2915     <class 'dict'>
3035     <class 'dict'>
3097     <class 'dict'>
3974     <class 'dict'>
3983     <class 'dict'>
4424     <class 'dict'>

If I try to access my dictionaries: new.apply(lambda x: x['coordinates']

如果我尝试访问我的字典: new.apply(lambda x: x['coordinates']

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-71-c0ad459ed1cc> in <module>()
----> 1 dfCombined.Coordinates.apply(coord_getter)

/Users/linwood/anaconda/envs/dataAnalysisWithPython/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2218         else:
   2219             values = self.asobject
-> 2220             mapped = lib.map_infer(values, f, convert=convert_dtype)
   2221 
   2222         if len(mapped) and isinstance(mapped[0], Series):

pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62658)()

<ipython-input-68-748ce2d8529e> in coord_getter(row)
      1 import ast
      2 def coord_getter(row):
----> 3     return (ast.literal_eval(row))['coordinates']

TypeError: 'bool' object is not subscriptable

It's some type of class, because when I run dirI get this for one object:

这是某种类型的类,因为当我运行时,dir我得到了一个对象:

new.apply(lambda x: dir(x))[130]
130           __class__
130        __contains__
130         __delattr__
130         __delitem__
130             __dir__
130             __doc__
130              __eq__
130          __format__
130              __ge__
130    __getattribute__
130         __getitem__
130              __gt__
130            __hash__
130            __init__
130            __iter__
130              __le__
130             __len__
130              __lt__
130              __ne__
130             __new__
130          __reduce__
130       __reduce_ex__
130            __repr__
130         __setattr__
130         __setitem__
130          __sizeof__
130             __str__
130    __subclasshook__
130               clear
130                copy
130            fromkeys
130                 get
130               items
130                keys
130                 pop
130             popitem
130          setdefault
130              update
130              values
Name: Coordinates, dtype: object

My Problem: I just want to access the dictionary. But, the object is <class 'dict'>. How do I covert this to a regular dict or just access the key:value pairs?

我的问题:我只想访问字典。但是,对象是<class 'dict'>。如何将其转换为常规字典或仅访问键:值对?

Any ideas??

有任何想法吗??

回答by andrew

My first instinct is to use the json.loadsto cast the strings into dicts. But the example you've posted does not follow the json standard since it uses single instead of double quotes. So you have to convert the strings first.

我的第一直觉是使用json.loads将字符串转换为 dicts。但是您发布的示例不遵循 json 标准,因为它使用单引号而不是双引号。所以你必须先转换字符串。

A second option is to just use regex to parse the strings. If the dict strings in your actual DataFrame do not exactly match my examples, I expect the regex method to be more robust since lat/long coords are fairly standard.

第二种选择是仅使用正则表达式来解析字符串。如果实际 DataFrame 中的 dict 字符串与我的示例不完全匹配,我希望正则表达式方法更加健壮,因为经纬度坐标是相当标准的。

import re
import pandasd as pd

df = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}",
    "{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"],
    'idx': [130, 278]})


##
# Solution 1- use json.loads
##

def string_to_dict(dict_string):
    # Convert to proper json format
    dict_string = dict_string.replace("'", '"').replace('u"', '"')
    return json.loads(dict_string)

df.CoordDicts = df.Coordinates.apply(string_to_dict)
df.CoordDicts[0]['coordinates']
#>>> [-43.30175, 123.45]


##
# Solution 2 - use regex
##
def get_lat_lon(dict_string):
    # Get the coordinates string with regex
    rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group()
    # Cast to floats
    coords = [float(x) for x in rs.split(',')]
    return coords

df.Coords = df.Coordinates.apply(get_lat_lon)
df.Coords[0]
#>>> [-43.30175, 123.45]

回答by PySeeker

Just ran into this problem. My solution:

刚遇到这个问题。我的解决方案:

import ast
import pandas as pd

df = pd.DataFrame(["{u'type': u'Point', u'coordinates': [-43,144]}","{u'type': u'Point', u'coordinates': [-34,34]}","{u'type': u'Point', u'coordinates': [-102,344]}"],columns=["Coordinates"])

df = df["Coordinates"].astype('str')
df = df.apply(lambda x: ast.literal_eval(x))
df = df.apply(pd.Series)

回答by fpersyn

Assuming you start with a Series of dicts, you can use the .tolist()method to create a list of dicts and use this as input for a DataFrame. This approach will map each distinct key to a column.

假设您从一系列 dicts 开始,您可以使用该.tolist()方法创建一个 dicts 列表并将其用作 DataFrame 的输入。这种方法将每个不同的键映射到一列。

You can filter by keys on creation by setting the columnsargument in pd.DataFrame(), giving you the neat one-liner below. Hope that helps.

您可以通过在 中设置columns参数来在创建时按键进行过滤pd.DataFrame(),从而为您提供以下简洁的单行。希望有帮助。

# Starting assumption:
data = ["{'coordinates': [-43.301755, -22.990065], 'type': 'Point', 'elevation': 1000}",
        "{'coordinates': [-51.17913026, -30.01201896], 'type': 'Point'}"]
s = pd.Series(data).apply(eval)

# Create a DataFrame with a list of dicts with a selection of columns
pd.DataFrame(s.tolist(), columns=['coordinates'])
Out[1]: 
                    coordinates
0      [-43.301755, -22.990065]
1  [-51.17913026, -30.01201896]

回答by piRSquared

It looks like you end up with something like this

看起来你最终得到了这样的东西

s = pd.Series([
        dict(type='Point', coordinates=[1, 1]),
        dict(type='Point', coordinates=[1, 2]),
        dict(type='Point', coordinates=[1, 3]),
        dict(type='Point', coordinates=[1, 4]),
        dict(type='Point', coordinates=[1, 5]),
        dict(type='Point', coordinates=[2, 1]),
        dict(type='Point', coordinates=[2, 2]),
        dict(type='Point', coordinates=[2, 3]),        
    ])

s

0    {u'type': u'Point', u'coordinates': [1, 1]}
1    {u'type': u'Point', u'coordinates': [1, 2]}
2    {u'type': u'Point', u'coordinates': [1, 3]}
3    {u'type': u'Point', u'coordinates': [1, 4]}
4    {u'type': u'Point', u'coordinates': [1, 5]}
5    {u'type': u'Point', u'coordinates': [2, 1]}
6    {u'type': u'Point', u'coordinates': [2, 2]}
7    {u'type': u'Point', u'coordinates': [2, 3]}
dtype: object

Solution

解决方案

df = s.apply(pd.Series)
df

enter image description here

在此处输入图片说明

then access coordinates

然后访问坐标

df.coordinates

0    [1, 1]
1    [1, 2]
2    [1, 3]
3    [1, 4]
4    [1, 5]
5    [2, 1]
6    [2, 2]
7    [2, 3]
Name: coordinates, dtype: object

Or even

甚至

df.coordinates.apply(pd.Series)

enter image description here

在此处输入图片说明