pandas ValueError:无法将字符串转换为浮点数,Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42920168/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:14:35  来源:igfitidea点击:

ValueError: could not convert string to float, Python

python-3.xpandasnumpyfeature-extractionvalueerror

提问by Anagha

I have a dataframe on which i'm trying to implement feature selection. There are 45 columns of types, integer, float and object.

我有一个数据框,我正在尝试在其上实现功能选择。有 45 列类型,整数、浮点数和对象。

But I'm unable to fit any feature selection model since its throwing vale Error. Please help me out

但是我无法适应任何特征选择模型,因为它抛出了 vale Error。请帮帮我

Dataframe :

数据框:

member_id   loan_amnt   funded_amnt funded_amnt_inv term        batch_enrolled   int_rate   grade
58189336    14350       14350       14350           36 months                    19.19      E
70011223    4800        4800        4800            36 months   BAT1586599       10.99      B

 sub_grade  emp_title   emp_length  home_ownership  annual_inc  verification_status pymnt_plan  desc                purpose title      zip_code addr_state   dti
 E3         clerk       9 years     OWN             28700       Source Verified     n           debt_consolidation  Debt consolidation 349xx    FL        33.88
 B4         HR          < 1 year    MORTGAGE        65000       Source Verified     n           home_improvement    Home improvement    209xx   MD      3.64

 last_week_pay  loan_status
 44th week          0
 9th week           1

Code:

代码:

 import numpy
 from pandas import read_csv
 from sklearn.decomposition import PCA
 # load data
 df = pd.read_csv("C:/Users/anagha/Documents/Python  Scripts/train_indessa.csv")
 array = df.values
 X = array[:,0:44]
 Y = array[:,44]
 # feature extraction
 pca = PCA(n_components=3)
 fit = pca.fit(X)

Error:

错误:

 Traceback (most recent call last):

 File "<ipython-input-8-20f3863fd66e>", line 2, in <module>
 fit = pca.fit(X)

 File "C:\Users\anagha\Anaconda3\lib\site-  packages\sklearn\decomposition\pca.py", line 301, in fit
self._fit(X)

File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\decomposition\pca.py", line 333, in _fit
copy=self.copy)

File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: '44th week'

回答by Miriam Farber

You cannot fit PCA on a non-numeric data. PCA involves matrix decomposition, and since some of your data is not numeric, you cannot apply PCA on it. So in order to proceed with PCA you should either ignore non-numeric columns , or transforming them into numeric columns.

您不能在非数字数据上拟合 PCA。PCA 涉及矩阵分解,并且由于您的某些数据不是数字,因此您不能对其应用 PCA。因此,为了继续使用 PCA,您应该忽略非数字列,或者将它们转换为数字列。

回答by Paco Bahena

It is not possible to convert a string like '44th week' to float.

不可能将像“第 44 周”这样的字符串转换为浮点数。

The only part from the string python could actually convert is 44. In order to do so i would recommend altering the string in order to keep exclusively numbers. Afterwards, you will easily apply sklearn fit. The following code should show how to get your np array ready to convert to float.

字符串 python 中唯一可以实际转换的部分是 44。为此,我建议更改字符串以仅保留数字。之后,您将轻松应用 sklearn fit。以下代码应显示如何让您的 np 数组准备好转换为浮点数。

import numpy as np
import pandas as pd

data = np.array([['rows','col1','Col2','Col_withtext'],
            ['Row1',1,2,'44th week'],
            ['Row2',3,4,'the 30th week']])


df = pd.DataFrame(data=data[1:,1:],
              index=data[1:,0],
              columns=data[0,1:])

Use pandas replace to remove text

使用 pandas 替换删除文本

df['Col_withtext'].replace(to_replace="[a-zA-Z]", value='', 
regex=True, inplace=True)

df.values

##prints out

##打印出来

 array([['1', '2', '44 '],
   ['3', '4', ' 30 ']], dtype=object)

Let me know how it goes!

让我知道事情的后续!