Pandas read_csv 无法正确加载逗号分隔的 CSV

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45893390/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:19:57  来源:igfitidea点击:

Pandas read_csv does not load a comma separated CSV properly

pythonpandascsvdataframekaggle

提问by user8385498

Now,I analyze Titanic challenge of Kaggel. My code is this: code

现在,我分析了 Kaggel 的泰坦尼克号挑战。我的代码是这样的: 代码

But my ideal output is: ideal output

但我的理想输出是: 理想输出

So,in my last code is

所以,在我的最后一个代码是

df["Age"].fillna(df.Age.median(), inplace=True)

and error happens

和错误发生

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'Age'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-9763f0a9951c> in <module>()
----> 1 df["Age"].fillna(df.Age.median(), inplace=True)

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
  1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Users/XXXi/anaconda/envs/py36/lib/python3.6/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'Age'

I use sep=','so I really cannot understand why this code cannot separate in each comma.How can I fix this?

我使用sep=','所以我真的不明白为什么这段代码不能在每个逗号中分开。我该如何解决这个问题?

I followed one answer,but error happens (I do not know why) error

我遵循了一个答案,但发生错误(我不知道为什么) 错误

My data is data

我的数据是 数据

采纳答案by cs95

Attention!

注意力!

The main issue was downloading the data. If you run a problem of loading and processing the Kaggle Titanic Dataset, you may re-download the CSV from hereand re-run your program.

主要问题是下载数据。如果您在加载和处理 Kaggle Titanic 数据集时遇到问题,您可以从这里重新下载 CSV并重新运行您的程序。



You can pass delimiter=',':

你可以通过delimiter=','

df = pd.read_csv("Desktop/data/train.csv", delimiter=',')
print(df.head())

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


print(df.columns)

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

Next, you can create a mapping of sorts:

接下来,您可以创建各种映射:

mapping = {'male' : 0, 'female' : 1}

And you'll call pd.Series.replace:

你会打电话给pd.Series.replace

df.Sex = df.Sex.replace(mapping)
print(df.Sex)

0    0
1    1
2    1
3    1
4    0
Name: Sex, dtype: int64

回答by StefanK

Your read_csv looks fine, the replace in the same line seems to be causing trouble.

您的 read_csv 看起来不错,同一行中的替换似乎引起了麻烦。

Try to first read the csv as is into the variable df. This way your code will be cleaner.

尝试首先将 csv 原样读取到变量 df 中。这样你的代码会更干净。

df = pd.read_csv('Desktop/data/train.csv',sep=',')
df['Sex'] = df['Sex'].map( {'female': 1, 'male': 0} )

But you can leave the sep argument altogether as comma is standard delimiter

但是您可以完全保留 sep 参数,因为逗号是标准分隔符

Alternatively do the cleaning with replace on a new line after you read the file into df and use inplace=True:

或者,在将文件读入 df 并使用后,在新行上使用替换进行清理 inplace=True

df['Sex'].replace({'male': 0, 'female': 1}, inplace=True)

General advice:

一般建议:

Kaggle webpage supports script sharing and commenting in kernel section. Try to look at it to see how you can go about the analysis if you are stuck somewhere:

Kaggle 网页支持内核部分的脚本共享和评论。试着看看它,看看如果你被困在某个地方,你可以如何进行分析:

https://www.kaggle.com/c/titanic/kernels

https://www.kaggle.com/c/titanic/kernels