Python pandas:读取Excel文件时如何指定数据类型?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32591466/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:52:02  来源:igfitidea点击:

Python pandas: how to specify data types when reading an Excel file?

pythonpandasdataframe

提问by Pythonista anonymous

I am importing an excel file into a pandas dataframe with the pandas.read_excel()function.

我正在使用该pandas.read_excel()函数将一个 excel 文件导入到一个 Pandas 数据框中。

One of the columns is the primary key of the table: it's all numbers, but it's stored as text (the little green triangle in the top left of the Excel cells confirms this).

其中一列是表格的主键:都是数字,但存储为文本(Excel 单元格左上角的小绿色三角形证实了这一点)。

However, when I import the file into a pandas dataframe, the column gets imported as a float. This means that, for example, '0614' becomes 614.

但是,当我将文件导入到 Pandas 数据框中时,该列会作为浮点数导入。这意味着,例如,“0614”变为 614。

Is there a way to specify the datatype when importing a column? I understand this is possible when importing CSV files but couldn't find anything in the syntax of read_excel().

有没有办法在导入列时指定数据类型?我知道在导入 CSV 文件时这是可能的,但在read_excel().

The only solution I can think of is to add an arbitrary letter at the beginning of the text (converting '0614' into 'A0614') in Excel, to make sure the column is imported as text, and then chopping off the 'A' in python, so I can match it to other tables I am importing from SQL.

我能想到的唯一解决方案是在 Excel 中的文本开头添加任意字母(将 '0614' 转换为 'A0614'),以确保将列作为文本导入,然后将 'A' 砍掉在 python 中,所以我可以将它与我从 SQL 导入的其他表相匹配。

采纳答案by tnknepp

You just specify converters. I created an excel spreadsheet of the following structure:

您只需指定转换器。我创建了以下结构的excel电子表格:

names   ages
bob     05
tom     4
suzy    3

Where the "ages" column is formatted as strings. To load:

其中“年龄”列的格式为字符串。装载:

import pandas as pd

df = pd.read_excel('Book1.xlsx',sheetname='Sheet1',header=0,converters={'names':str,'ages':str})
>>> df
       names ages
   0   bob   05
   1   tom   4
   2   suzy  3

回答by Nix G-D

The read_excel() function has a converters argument, where you can apply functions to input in certain columns. You can use this to keep them as strings. Documentation:

read_excel() 函数有一个转换器参数,您可以在其中将函数应用于某些列中的输入。您可以使用它来将它们保留为字符串。 文件

Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the Excel cell content, and return the transformed content.

用于转换某些列中的值的函数字典。键可以是整数或列标签,值是采用一个输入参数、Excel 单元格内容并返回转换后的内容的函数。

Example code:

示例代码:

pandas.read_excel(my_file, converters = {my_str_column: str})

回答by Nickil Maveli

Starting with v0.20.0, the dtypekeyword argument in read_excel()function could be used to specify the data types that needs to be applied to the columns just like it exists for read_csv()case.

从 开始v0.20.0,函数中的dtype关键字参数read_excel()可用于指定需要应用于列的数据类型,就像它存在于 read_csv()case 中一样。

Using convertersand dtypearguments together on the same column name would lead to the latter getting shadowed and the former gaining preferance.

在同一个列名上一起使用convertersdtype参数会导致后者被掩盖而前者获得优先权。



1) Inorder for it to not interpret the dtypesbut rather pass all the contents of it's columns as they were originally in the file before, we could set this arg to stror objectso that we don't mess up our data. (one such case would be leading zeros in numbers which would be lost otherwise)

1)为了让它不解释dtypes而是传递它的列的所有内容,因为它们之前在文件中最初存在,我们可以将此参数设置为strorobject以便我们不会弄乱我们的数据。(一种这样的情况是数字中的前导零,否则会丢失)

pd.read_excel('file_name.xlsx', dtype=str)            # (or) dtype=object

2) It even supports a dict mapping wherein the keysconstitute the column names and valuesit's respective data type to be set especially when you want to alter the dtypefor a subset of all the columns.

2)它甚至支持字典映射,其中keys构成列名并values设置相应的数据类型,尤其是当您想要更改dtype所有列的子集时。

# Assuming data types for `a` and `b` columns to be altered
pd.read_excel('file_name.xlsx', dtype={'a': np.float64, 'b': np.int32})

回答by Tango

In case if you are not aware of the number and name of columns in dataframe then this method can be handy:

如果您不知道数据框中列的数量和名称,那么此方法可能很方便:

column_list = []
df_column = pd.read_excel(file_name, 'Sheet1').columns
for i in df_column:
    column_list.append(i)
converter = {col: str for col in column_list} 
df_actual = pd.read_excel(file_name, converters=converter)

where column_list is the list of your column names.

其中 column_list 是您的列名列表。

回答by jpp

If your key has a fixed number of digits, you shouldprobably store as text rather than as numeric data. You can use the convertersargument or read_excelfor this.

如果您的密钥有固定位数,您可能应该将其存储为文本而不是数字数据。您可以使用converters参数或read_excel为此。

Or, if this does not work, just manipulate your data once it's read into your dataframe:

或者,如果这不起作用,只需在将数据读入数据帧后对其进行操作:

df['key_zfill'] = df['key'].astype(str).str.zfill(4)

  names   key key_zfill
0   abc     5      0005
1   def  4962      4962
2   ghi   300      0300
3   jkl    14      0014
4   mno    20      0020

回答by HazimoRa3d

If you don't know the column names and you want to specify strdata type to all columns:

如果您不知道列名并且想为所有列指定str数据类型:

table = pd.read_excel("path_to_filename")
cols = table.columns
conv = dict(zip(cols ,[str] * len(cols)))
table = pd.read_excel("path_to_filename", converters=conv)