pandas 如何使用pandas.read_csv()将索引数据作为字符串读取?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35058435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read index data as string with pandas.read_csv()?
提问by ykensuke9
I'm trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn't have any characters, pandas handles this data as integer. How to read as string?
我正在尝试使用 Pandas 将 csv 文件作为 DataFrame 读取,并且我想将索引行读取为字符串。但是,由于 index 的行没有任何字符,pandas 将此数据作为整数处理。如何读取字符串?
Here are my csv file and code:
这是我的 csv 文件和代码:
[sample.csv]
uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30
[code]
df = pd.read_csv('sample.csv', index_col="uid" dtype=float)
print df.index.values
The result: df.index is integer, not string:
结果: df.index 是整数,而不是字符串:
>>> [1 2 3]
But I want to get df.index as string:
但我想得到 df.index 作为字符串:
>>> ['01', '02', '03']
And an additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.
还有一个额外的条件:其余的索引数据必须是数值,而且它们实际上太多了,我无法用特定的列名指向它们。
回答by EdChum
pass dtype
param to specify the dtype:
传递dtype
参数以指定数据类型:
In [159]:
import pandas as pd
import io
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
df.set_index('uid', inplace=True)
df.index
Out[159]:
Index(['01', '02', '03'], dtype='object', name='uid')
So in your case the following should work:
所以在你的情况下,以下应该有效:
df = pd.read_csv('sample.csv', dtype={'uid':str})
df.set_index('uid', inplace=True)
The one-line equivalent doesn't work, due to a still-outstanding pandas bughere where the dtype param is ignored on cols that are to be treated as the index**:
单行等效项不起作用,因为这里仍然存在一个未解决的Pandas 错误,其中 dtype 参数在被视为索引的列上被忽略**:
df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')
You can dynamically do this if we assume the first column is the index column:
如果我们假设第一列是索引列,您可以动态执行此操作:
In [171]:
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
df = pd.read_csv(io.StringIO(t), dtype=dtypes)
df.set_index('uid', inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 01 to 03
Data columns (total 3 columns):
f1 3 non-null float64
f2 3 non-null float64
f3 3 non-null float64
dtypes: float64(3)
memory usage: 96.0+ bytes
In [172]:
df.index
Out[172]:
Index(['01', '02', '03'], dtype='object', name='uid')
Here we read just the header row to get the column names:
在这里,我们只读取标题行以获取列名:
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
we then generate dict of the column names with the desired dtypes:
然后我们用所需的数据类型生成列名的字典:
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign float
as the desired dtype and add the index col specifying the type to be str
, you can then pass this as the dtype
param to read_csv
我们得到索引名称,假设它是第一个条目,然后从其余的 cols 创建一个 dict 并分配float
为所需的 dtype 并添加指定类型的索引 col str
,然后您可以将其作为dtype
参数传递给read_csv
回答by Serbitar
If the result is not a string you have to convert it to be a string. try:
如果结果不是字符串,则必须将其转换为字符串。尝试:
result = [str(i) for i in result]
or in this case:
或者在这种情况下:
print([str(i) for i in df.index.values])