pandas 如何使用pandas.read_csv()将索引数据作为字符串读取?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35058435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:34:55  来源:igfitidea点击:

How to read index data as string with pandas.read_csv()?

pythonpandascsvindexing

提问by ykensuke9

I'm trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn't have any characters, pandas handles this data as integer. How to read as string?

我正在尝试使用 Pandas 将 csv 文件作为 DataFrame 读取,并且我想将索引行读取为字符串。但是,由于 index 的行没有任何字符,pandas 将此数据作为整数处理。如何读取字符串?

Here are my csv file and code:

这是我的 csv 文件和代码:

[sample.csv]    
    uid,f1,f2,f3
    01,0.1,1,10
    02,0.2,2,20
    03,0.3,3,30

[code]
df = pd.read_csv('sample.csv', index_col="uid" dtype=float)
print df.index.values

The result: df.index is integer, not string:

结果: df.index 是整数,而不是字符串:

>>> [1 2 3]

But I want to get df.index as string:

但我想得到 df.index 作为字符串:

>>> ['01', '02', '03']

And an additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.

还有一个额外的条件:其余的索引数据必须是数值,而且它们实际上太多了,我无法用特定的列名指向它们。

回答by EdChum

pass dtypeparam to specify the dtype:

传递dtype参数以指定数据类型:

In [159]:
import pandas as pd
import io
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
df.set_index('uid', inplace=True)
df.index

Out[159]:
Index(['01', '02', '03'], dtype='object', name='uid')

So in your case the following should work:

所以在你的情况下,以下应该有效

df = pd.read_csv('sample.csv', dtype={'uid':str})
df.set_index('uid', inplace=True)

The one-line equivalent doesn't work, due to a still-outstanding pandas bughere where the dtype param is ignored on cols that are to be treated as the index**:

单行等效项不起作用,因为这里仍然存在一个未解决的Pandas 错误,其中 dtype 参数在被视为索引的列上被忽略**:

df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')

You can dynamically do this if we assume the first column is the index column:

如果我们假设第一列是索引列,您可以动态执行此操作:

In [171]:
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
df = pd.read_csv(io.StringIO(t), dtype=dtypes)
df.set_index('uid', inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 01 to 03
Data columns (total 3 columns):
f1    3 non-null float64
f2    3 non-null float64
f3    3 non-null float64
dtypes: float64(3)
memory usage: 96.0+ bytes

In [172]:
df.index

Out[172]:
Index(['01', '02', '03'], dtype='object', name='uid')

Here we read just the header row to get the column names:

在这里,我们只读取标题行以获取列名:

cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()

we then generate dict of the column names with the desired dtypes:

然后我们用所需的数据类型生成列名的字典:

index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str

we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign floatas the desired dtype and add the index col specifying the type to be str, you can then pass this as the dtypeparam to read_csv

我们得到索引名称,假设它是第一个条目,然后从其余的 cols 创建一个 dict 并分配float为所需的 dtype 并添加指定类型的索引 col str,然后您可以将其作为dtype参数传递给read_csv

回答by Serbitar

If the result is not a string you have to convert it to be a string. try:

如果结果不是字符串,则必须将其转换为字符串。尝试:

result = [str(i) for i in result]

or in this case:

或者在这种情况下:

print([str(i) for i in df.index.values])