Python 类型错误:“float”类型的参数不可迭代
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39238057/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
TypeError: argument of type 'float' is not iterable
提问by jaspreet kaur bassan
I am new to python and TensorFlow. I recently started understanding and executing TensorFlow examples, and came across this one: https://www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html
我是 python 和 TensorFlow 的新手。我最近开始理解和执行 TensorFlow 示例,并遇到了这个:https: //www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html
I got the error, TypeError: argument of type 'float' is not iterable, and I believe that the problem is with the following line of code:
我收到错误TypeError: argument of type 'float' is not iterable,我相信问题出在以下代码行:
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
(income_bracket is the label column of the census dataset, with '>50K' being one of the possible label values, and the other label is '=<50K'. The dataset is read into df_train. The explanation provided in the documentation for the reason to do the above is, "Since the task is a binary classification problem, we'll construct a label column named "label" whose value is 1 if the income is over 50K, and 0 otherwise.")
(income_bracket 是人口普查数据集的标签列,其中 '>50K' 是可能的标签值之一,另一个标签是 '=<50K'。数据集被读入 df_train。文档中提供的解释这样做的原因是,“由于任务是一个二元分类问题,我们将构造一个名为“label”的标签列,如果收入超过 50K,其值为 1,否则为 0。”)
If anyone could explain me what is exactly happening and how should I fix it, that'll be great. I tried using Python2.7 and Python3.4, and I don't think that the problem is with the version of the language. Also, if anyone is aware of great tutorials for someone who is new to TensorFlow and pandas, please share the links.
如果有人能向我解释到底发生了什么以及我应该如何解决它,那就太好了。我尝试使用Python2.7和Python3.4,我认为问题不是语言版本的问题。另外,如果有人知道适合 TensorFlow 和 Pandas 新手的很棒的教程,请分享链接。
Complete program:
完整程序:
import pandas as pd
import urllib
import tempfile
import tensorflow as tf
gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=["Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
age = tf.contrib.layers.real_valued_column("age")
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
wide_columns = [gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets, tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))]
deep_columns = [
tf.contrib.layers.embedding_column(workclass, dimension=8),
tf.contrib.layers.embedding_column(education, dimension=8),
tf.contrib.layers.embedding_column(marital_status, dimension=8),
tf.contrib.layers.embedding_column(gender, dimension=8),
tf.contrib.layers.embedding_column(relationship, dimension=8),
tf.contrib.layers.embedding_column(race, dimension=8),
tf.contrib.layers.embedding_column(native_country, dimension=8),
tf.contrib.layers.embedding_column(occupation, dimension=8),
age, education_num, capital_gain, capital_loss, hours_per_week]
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,
linear_feature_columns=wide_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=[100, 50])
COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
"marital_status", "occupation", "relationship", "race", "gender",
"capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
LABEL_COLUMN = 'label'
CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation", "relationship", "race", "gender", "native_country"]
CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss", "hours_per_week"]
train_file = tempfile.NamedTemporaryFile()
test_file = tempfile.NamedTemporaryFile()
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)
df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
def input_fn(df):
continuous_cols = {k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS}
categorical_cols = {k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
shape=[df[k].size, 1])
for k in CATEGORICAL_COLUMNS}
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
label = tf.constant(df[LABEL_COLUMN].values)
return feature_cols, label
def train_input_fn():
return input_fn(df_train)
def eval_input_fn():
return input_fn(df_test)
m.fit(input_fn=train_input_fn, steps=200)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
Thank you
谢谢
PS: Full stack trace for the error
PS:错误的完整堆栈跟踪
Traceback (most recent call last):
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <module>
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2023, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "inference.pyx", line 920, in pandas.lib.map_infer (pandas/lib.c:44780)
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <lambda>
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
TypeError: argument of type 'float' is not iterable
采纳答案by jaspreet kaur bassan
The program works verbatim with the latest version of pandas, i.e., 0.18.1
该程序可以与最新版本的熊猫(即 0.18.1)逐字运行
回答by Microos
As you can see, when you inspect the test.data
, you will obviously see that the first line of data has "NAN" in income_bracket
field.
如您所见,当您检查 时test.data
,您会很明显地看到第一行数据的income_bracket
字段中有“NAN” 。
I have further inspected that this is the only line contains "NAN" by doing:
我通过执行以下操作进一步检查了这是唯一包含“NAN”的行:
ib = df_test ["income_bracket"]
t = type('12')
for idx,i in enumerate(ib):
if(type(i) != t):
print idx,type(i)
RESULT: 0 <type 'float'>
RESULT: 0 <type 'float'>
So you may just skip this row by:
所以你可以跳过这一行:
df_test = pd.read_csv(file_test , names=COLUMNS, skipinitialspace=True, skiprows=1)
df_test = pd.read_csv(file_test , names=COLUMNS, skipinitialspace=True, skiprows=1)
回答by no7dw
maybe got a number in the for loop after in keyword try to skip it with a test ("isinstance" )
在 in 关键字尝试跳过它后,可能在 for 循环中有一个数字(“isinstance”)
if(isinstance(lines, str)):
for x in lines:
foo()
else:
skip