pandas ValueError:无法将字符串转换为浮点数,Python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42920168/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: could not convert string to float, Python
提问by Anagha
I have a dataframe on which i'm trying to implement feature selection. There are 45 columns of types, integer, float and object.
我有一个数据框,我正在尝试在其上实现功能选择。有 45 列类型,整数、浮点数和对象。
But I'm unable to fit any feature selection model since its throwing vale Error. Please help me out
但是我无法适应任何特征选择模型,因为它抛出了 vale Error。请帮帮我
Dataframe :
数据框:
member_id loan_amnt funded_amnt funded_amnt_inv term batch_enrolled int_rate grade
58189336 14350 14350 14350 36 months 19.19 E
70011223 4800 4800 4800 36 months BAT1586599 10.99 B
sub_grade emp_title emp_length home_ownership annual_inc verification_status pymnt_plan desc purpose title zip_code addr_state dti
E3 clerk 9 years OWN 28700 Source Verified n debt_consolidation Debt consolidation 349xx FL 33.88
B4 HR < 1 year MORTGAGE 65000 Source Verified n home_improvement Home improvement 209xx MD 3.64
last_week_pay loan_status
44th week 0
9th week 1
Code:
代码:
import numpy
from pandas import read_csv
from sklearn.decomposition import PCA
# load data
df = pd.read_csv("C:/Users/anagha/Documents/Python Scripts/train_indessa.csv")
array = df.values
X = array[:,0:44]
Y = array[:,44]
# feature extraction
pca = PCA(n_components=3)
fit = pca.fit(X)
Error:
错误:
Traceback (most recent call last):
File "<ipython-input-8-20f3863fd66e>", line 2, in <module>
fit = pca.fit(X)
File "C:\Users\anagha\Anaconda3\lib\site- packages\sklearn\decomposition\pca.py", line 301, in fit
self._fit(X)
File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\decomposition\pca.py", line 333, in _fit
copy=self.copy)
File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: '44th week'
回答by Miriam Farber
You cannot fit PCA on a non-numeric data. PCA involves matrix decomposition, and since some of your data is not numeric, you cannot apply PCA on it. So in order to proceed with PCA you should either ignore non-numeric columns , or transforming them into numeric columns.
您不能在非数字数据上拟合 PCA。PCA 涉及矩阵分解,并且由于您的某些数据不是数字,因此您不能对其应用 PCA。因此,为了继续使用 PCA,您应该忽略非数字列,或者将它们转换为数字列。
回答by Paco Bahena
It is not possible to convert a string like '44th week' to float.
不可能将像“第 44 周”这样的字符串转换为浮点数。
The only part from the string python could actually convert is 44. In order to do so i would recommend altering the string in order to keep exclusively numbers. Afterwards, you will easily apply sklearn fit. The following code should show how to get your np array ready to convert to float.
字符串 python 中唯一可以实际转换的部分是 44。为此,我建议更改字符串以仅保留数字。之后,您将轻松应用 sklearn fit。以下代码应显示如何让您的 np 数组准备好转换为浮点数。
import numpy as np
import pandas as pd
data = np.array([['rows','col1','Col2','Col_withtext'],
['Row1',1,2,'44th week'],
['Row2',3,4,'the 30th week']])
df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
Use pandas replace to remove text
使用 pandas 替换删除文本
df['Col_withtext'].replace(to_replace="[a-zA-Z]", value='',
regex=True, inplace=True)
df.values
##prints out
##打印出来
array([['1', '2', '44 '],
['3', '4', ' 30 ']], dtype=object)
Let me know how it goes!
让我知道事情的后续!