Pandas read_csv 无法正确加载逗号分隔的 CSV

Question

提问by user8385498

Now,I analyze Titanic challenge of Kaggel. My code is this: code

现在，我分析了 Kaggel 的泰坦尼克号挑战。我的代码是这样的：

But my ideal output　is:

但我的理想输出是：

So,in my last code is

所以，在我的最后一个代码是

df["Age"].fillna(df.Age.median(), inplace=True)

and error happens

和错误发生

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'Age'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-9763f0a9951c> in <module>()
----> 1 df["Age"].fillna(df.Age.median(), inplace=True)

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
  1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'Age'

I use sep=','so I really cannot understand why this code cannot separate in each comma.How can I fix this?

我使用sep=','所以我真的不明白为什么这段代码不能在每个逗号中分开。我该如何解决这个问题？

I followed one answer,but error happens (I do not know why)

我遵循了一个答案，但发生错误（我不知道为什么）

My data is

我的数据是

Answer 1

采纳答案by cs95

Attention!

注意力！

The main issue was downloading the data. If you run a problem of loading and processing the Kaggle Titanic Dataset, you may re-download the CSV from hereand re-run your program.

主要问题是下载数据。如果您在加载和处理 Kaggle Titanic 数据集时遇到问题，您可以从这里重新下载 CSV并重新运行您的程序。

You can pass delimiter=',':

你可以通过delimiter=','：

df = pd.read_csv("Desktop/data/train.csv", delimiter=',')
print(df.head())

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


print(df.columns)

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

Next, you can create a mapping of sorts:

接下来，您可以创建各种映射：

mapping = {'male' : 0, 'female' : 1}

And you'll call pd.Series.replace:

你会打电话给pd.Series.replace：

df.Sex = df.Sex.replace(mapping)
print(df.Sex)

0    0
1    1
2    1
3    1
4    0
Name: Sex, dtype: int64

Answer 2

回答by StefanK

Your read_csv looks fine, the replace in the same line seems to be causing trouble.

您的 read_csv 看起来不错，同一行中的替换似乎引起了麻烦。

Try to first read the csv as is into the variable df. This way your code will be cleaner.

尝试首先将 csv 原样读取到变量 df 中。这样你的代码会更干净。

df = pd.read_csv('Desktop/data/train.csv',sep=',')
df['Sex'] = df['Sex'].map( {'female': 1, 'male': 0} )

But you can leave the sep argument altogether as comma is standard delimiter

但是您可以完全保留 sep 参数，因为逗号是标准分隔符

Alternatively do the cleaning with replace on a new line after you read the file into df and use inplace=True:

或者，在将文件读入 df 并使用后，在新行上使用替换进行清理 inplace=True：

df['Sex'].replace({'male': 0, 'female': 1}, inplace=True)

General advice:

一般建议：

Kaggle webpage supports script sharing and commenting in kernel section. Try to look at it to see how you can go about the analysis if you are stuck somewhere:

Kaggle 网页支持内核部分的脚本共享和评论。试着看看它，看看如果你被困在某个地方，你可以如何进行分析：

https://www.kaggle.com/c/titanic/kernels

Pandas read_csv 无法正确加载逗号分隔的 CSV

提问by user8385498

采纳答案by cs95

回答by StefanK

相关推荐

最近更新

标签

Pandas read_csv 无法正确加载逗号分隔的 CSV

提问by user8385498

采纳答案by cs95

回答by StefanK

相关推荐

pandas 熊猫 read_csv 的解析器错误

pandas 如何在 Python 中从 OSM 文件中提取和可视化数据

pandas 如何用字典键替换数据框列值？

pandas numpy.where: TypeError: 无效的类型提升

相关推荐

最近更新

标签