pandas Python - 如何获取 CSV 文件中所有列的数据类型?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52369572/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:02:04  来源:igfitidea点击:

Python - How to get data types for all columns in CSV file?

pythonpandasdataframetypes

提问by Joe

I am trying to get all data types from a CSV file for each column.
There is no documentation about data types in a file and manually checking will take a long time (it has 150 columns).

Started using this approach:

我正在尝试从每列的 CSV 文件中获取所有数据类型。
文件中没有关于数据类型的文档,手动检查需要很长时间(它有 150 列)。

开始使用这种方法:

df = pd.read_csv('/tmp/file.csv')

>>> df.dtypes
a   int64
b   int64
c   object
d   float64

Is above approach good enough or there is a better approach to figure out data types?
Also - file has 150 columns. When I type df.types- I can see only 15 or so columns. How to see them all?

上述方法是否足够好,或者有更好的方法来确定数据类型?
此外 - 文件有 150 列。当我输入时df.types- 我只能看到 15 列左右。如何全部看到?

回答by thesilkworm

Depending on the size of your file, you might be able to save some time by only reading in the first few rows, using the nrowsargument of pd.read_csv:

根据文件的大小,您可以通过使用以下nrows参数仅读取前几行来节省一些时间pd.read_csv

df = pd.read_csv('/tmp/file.csv', nrows=25)

This is only useful if you know for sure that the types can be correctly inferred from the first n rows though, so be careful with this.

仅当您确定可以从前 n 行正确推断类型时,这才有用,因此请注意这一点。

Once you have the data (or a subset of it) loaded into a DataFrame, you can view the types in a number of different ways, a few of which have been posted already, but I'll share another using a simple loop and iteritems:

将数据(或其子集)加载到 DataFrame 后,您可以通过多种不同方式查看类型,其中一些已经发布,但我将使用一个简单的循环和共享另一个iteritems

for name, dtype in df.dtypes.iteritems():
    print(name, dtype)

a int64
b float64
c object

回答by Anna Semjén

I think this is a good way to do it. It returns a Series object. To see more rows you can use this one: pd.set_option('display.max_rows', 250)

我认为这是一个很好的方法。它返回一个 Series 对象。要查看更多行,您可以使用此行: pd.set_option('display.max_rows', 250)

回答by Chris A

You could update the max_info_columnsdisplay option and use DataFrame.info()

您可以更新max_info_columns显示选项并使用DataFrame.info()

pd.set_option('max_info_columns', 200)
df.info()

回答by Pedro Henrique

There are some ways to do it. I like to use

有一些方法可以做到。我喜欢用

df.dtypes

or

或者

for i, v in enumerate(df.columns):
    print(i, v)