pandas 将字符串转换为字典,然后访问键:值???如何访问 Python 的 <class 'dict'> 中的数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39169718/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python?
提问by Linwoodc3
I am having issues accessing data inside a dictionary.
我在访问字典中的数据时遇到问题。
Sys: Macbook 2012
Python: Python 3.5.1 :: Continuum Analytics, Inc.
系统:Macbook 2012
Python:Python 3.5.1 :: Continuum Analytics, Inc.
I am working with a dask.dataframecreated from a csv.
我正在使用从 csv 创建的dask.dataframe。
Edit Question
编辑问题
How I got to this point
我是如何走到这一步的
Assume I start out with a Pandas Series:
假设我从 Pandas 系列开始:
df.Coordinates
130 {u'type': u'Point', u'coordinates': [-43.30175...
278 {u'type': u'Point', u'coordinates': [-51.17913...
425 {u'type': u'Point', u'coordinates': [-43.17986...
440 {u'type': u'Point', u'coordinates': [-51.16376...
877 {u'type': u'Point', u'coordinates': [-43.17986...
1313 {u'type': u'Point', u'coordinates': [-49.72688...
1734 {u'type': u'Point', u'coordinates': [-43.57405...
1817 {u'type': u'Point', u'coordinates': [-43.77649...
1835 {u'type': u'Point', u'coordinates': [-43.17132...
2739 {u'type': u'Point', u'coordinates': [-43.19583...
2915 {u'type': u'Point', u'coordinates': [-43.17986...
3035 {u'type': u'Point', u'coordinates': [-51.01583...
3097 {u'type': u'Point', u'coordinates': [-43.17891...
3974 {u'type': u'Point', u'coordinates': [-8.633880...
3983 {u'type': u'Point', u'coordinates': [-46.64960...
4424 {u'type': u'Point', u'coordinates': [-43.17986...
The problem is, this is not a true dataframe of dictionaries. Instead, it's a column full of strings that LOOK like dictionaries. Running this show it:
问题是,这不是字典的真正数据框。相反,它是一列充满看起来像字典的字符串。运行这个显示它:
df.Coordinates.apply(type)
130 <class 'str'>
278 <class 'str'>
425 <class 'str'>
440 <class 'str'>
877 <class 'str'>
1313 <class 'str'>
1734 <class 'str'>
1817 <class 'str'>
1835 <class 'str'>
2739 <class 'str'>
2915 <class 'str'>
3035 <class 'str'>
3097 <class 'str'>
3974 <class 'str'>
3983 <class 'str'>
4424 <class 'str'>
My Goal: Access the coordinates
key and value in the dictionary. That's it. But it's a str
我的目标:访问coordinates
字典中的键和值。就是这样。但这是一个str
I converted the strings to dictionaries using eval
.
我使用eval
.
new = df.Coordinates.apply(eval)
130 {'coordinates': [-43.301755, -22.990065], 'typ...
278 {'coordinates': [-51.17913026, -30.01201896], ...
425 {'coordinates': [-43.17986794, -22.91000096], ...
440 {'coordinates': [-51.16376782, -29.95488677], ...
877 {'coordinates': [-43.17986794, -22.91000096], ...
1313 {'coordinates': [-49.72688407, -29.33757253], ...
1734 {'coordinates': [-43.574057, -22.928059], 'typ...
1817 {'coordinates': [-43.77649254, -22.86940539], ...
1835 {'coordinates': [-43.17132318, -22.90895217], ...
2739 {'coordinates': [-43.1958313, -22.98755333], '...
2915 {'coordinates': [-43.17986794, -22.91000096], ...
3035 {'coordinates': [-51.01583481, -29.63593292], ...
3097 {'coordinates': [-43.17891379, -22.96476163], ...
3974 {'coordinates': [-8.63388008, 41.14594453], 't...
3983 {'coordinates': [-46.64960938, -23.55902666], ...
4424 {'coordinates': [-43.17986794, -22.91000096], ...
Next I text the type of object and get:
接下来我输入对象的类型并得到:
130 <class 'dict'>
278 <class 'dict'>
425 <class 'dict'>
440 <class 'dict'>
877 <class 'dict'>
1313 <class 'dict'>
1734 <class 'dict'>
1817 <class 'dict'>
1835 <class 'dict'>
2739 <class 'dict'>
2915 <class 'dict'>
3035 <class 'dict'>
3097 <class 'dict'>
3974 <class 'dict'>
3983 <class 'dict'>
4424 <class 'dict'>
If I try to access my dictionaries: new.apply(lambda x: x['coordinates']
如果我尝试访问我的字典: new.apply(lambda x: x['coordinates']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-71-c0ad459ed1cc> in <module>()
----> 1 dfCombined.Coordinates.apply(coord_getter)
/Users/linwood/anaconda/envs/dataAnalysisWithPython/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2218 else:
2219 values = self.asobject
-> 2220 mapped = lib.map_infer(values, f, convert=convert_dtype)
2221
2222 if len(mapped) and isinstance(mapped[0], Series):
pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62658)()
<ipython-input-68-748ce2d8529e> in coord_getter(row)
1 import ast
2 def coord_getter(row):
----> 3 return (ast.literal_eval(row))['coordinates']
TypeError: 'bool' object is not subscriptable
It's some type of class, because when I run dir
I get this for one object:
这是某种类型的类,因为当我运行时,dir
我得到了一个对象:
new.apply(lambda x: dir(x))[130]
130 __class__
130 __contains__
130 __delattr__
130 __delitem__
130 __dir__
130 __doc__
130 __eq__
130 __format__
130 __ge__
130 __getattribute__
130 __getitem__
130 __gt__
130 __hash__
130 __init__
130 __iter__
130 __le__
130 __len__
130 __lt__
130 __ne__
130 __new__
130 __reduce__
130 __reduce_ex__
130 __repr__
130 __setattr__
130 __setitem__
130 __sizeof__
130 __str__
130 __subclasshook__
130 clear
130 copy
130 fromkeys
130 get
130 items
130 keys
130 pop
130 popitem
130 setdefault
130 update
130 values
Name: Coordinates, dtype: object
My Problem: I just want to access the dictionary. But, the object is <class 'dict'>
. How do I covert this to a regular dict or just access the key:value pairs?
我的问题:我只想访问字典。但是,对象是<class 'dict'>
。如何将其转换为常规字典或仅访问键:值对?
Any ideas??
有任何想法吗??
回答by andrew
My first instinct is to use the json.loads
to cast the strings into dicts. But the example you've posted does not follow the json standard since it uses single instead of double quotes. So you have to convert the strings first.
我的第一直觉是使用json.loads
将字符串转换为 dicts。但是您发布的示例不遵循 json 标准,因为它使用单引号而不是双引号。所以你必须先转换字符串。
A second option is to just use regex to parse the strings. If the dict strings in your actual DataFrame do not exactly match my examples, I expect the regex method to be more robust since lat/long coords are fairly standard.
第二种选择是仅使用正则表达式来解析字符串。如果实际 DataFrame 中的 dict 字符串与我的示例不完全匹配,我希望正则表达式方法更加健壮,因为经纬度坐标是相当标准的。
import re
import pandasd as pd
df = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}",
"{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"],
'idx': [130, 278]})
##
# Solution 1- use json.loads
##
def string_to_dict(dict_string):
# Convert to proper json format
dict_string = dict_string.replace("'", '"').replace('u"', '"')
return json.loads(dict_string)
df.CoordDicts = df.Coordinates.apply(string_to_dict)
df.CoordDicts[0]['coordinates']
#>>> [-43.30175, 123.45]
##
# Solution 2 - use regex
##
def get_lat_lon(dict_string):
# Get the coordinates string with regex
rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group()
# Cast to floats
coords = [float(x) for x in rs.split(',')]
return coords
df.Coords = df.Coordinates.apply(get_lat_lon)
df.Coords[0]
#>>> [-43.30175, 123.45]
回答by PySeeker
Just ran into this problem. My solution:
刚遇到这个问题。我的解决方案:
import ast
import pandas as pd
df = pd.DataFrame(["{u'type': u'Point', u'coordinates': [-43,144]}","{u'type': u'Point', u'coordinates': [-34,34]}","{u'type': u'Point', u'coordinates': [-102,344]}"],columns=["Coordinates"])
df = df["Coordinates"].astype('str')
df = df.apply(lambda x: ast.literal_eval(x))
df = df.apply(pd.Series)
回答by fpersyn
Assuming you start with a Series of dicts, you can use the .tolist()
method to create a list of dicts and use this as input for a DataFrame. This approach will map each distinct key to a column.
假设您从一系列 dicts 开始,您可以使用该.tolist()
方法创建一个 dicts 列表并将其用作 DataFrame 的输入。这种方法将每个不同的键映射到一列。
You can filter by keys on creation by setting the columns
argument in pd.DataFrame()
, giving you the neat one-liner below. Hope that helps.
您可以通过在 中设置columns
参数来在创建时按键进行过滤pd.DataFrame()
,从而为您提供以下简洁的单行。希望有帮助。
# Starting assumption:
data = ["{'coordinates': [-43.301755, -22.990065], 'type': 'Point', 'elevation': 1000}",
"{'coordinates': [-51.17913026, -30.01201896], 'type': 'Point'}"]
s = pd.Series(data).apply(eval)
# Create a DataFrame with a list of dicts with a selection of columns
pd.DataFrame(s.tolist(), columns=['coordinates'])
Out[1]:
coordinates
0 [-43.301755, -22.990065]
1 [-51.17913026, -30.01201896]
回答by piRSquared
It looks like you end up with something like this
看起来你最终得到了这样的东西
s = pd.Series([
dict(type='Point', coordinates=[1, 1]),
dict(type='Point', coordinates=[1, 2]),
dict(type='Point', coordinates=[1, 3]),
dict(type='Point', coordinates=[1, 4]),
dict(type='Point', coordinates=[1, 5]),
dict(type='Point', coordinates=[2, 1]),
dict(type='Point', coordinates=[2, 2]),
dict(type='Point', coordinates=[2, 3]),
])
s
0 {u'type': u'Point', u'coordinates': [1, 1]}
1 {u'type': u'Point', u'coordinates': [1, 2]}
2 {u'type': u'Point', u'coordinates': [1, 3]}
3 {u'type': u'Point', u'coordinates': [1, 4]}
4 {u'type': u'Point', u'coordinates': [1, 5]}
5 {u'type': u'Point', u'coordinates': [2, 1]}
6 {u'type': u'Point', u'coordinates': [2, 2]}
7 {u'type': u'Point', u'coordinates': [2, 3]}
dtype: object
Solution
解决方案
df = s.apply(pd.Series)
df
then access coordinates
然后访问坐标
df.coordinates
0 [1, 1]
1 [1, 2]
2 [1, 3]
3 [1, 4]
4 [1, 5]
5 [2, 1]
6 [2, 2]
7 [2, 3]
Name: coordinates, dtype: object
Or even
甚至
df.coordinates.apply(pd.Series)