Python Pandas read_csv dtype 读取所有列，但很少读取字符串

Question

提问by Nikhil VJ

I'm using Pandas to read a bunch of CSVs. Passing an options json to dtype parameter to tell pandas which columns to read as string instead of the default:

我正在使用 Pandas 读取一堆 CSV。将选项 json 传递给 dtype 参数以告诉 pandas 将哪些列读取为字符串而不是默认值：

dtype_dic= { 'service_id':str, 'end_date':str, ... }
feedArray = pd.read_csv(feedfile , dtype = dtype_dic)

In my scenario, allthe columns except a few specific ones are to be read as strings. So instead of defining several columns as str in dtype_dic, I'd like to set just my chosen few as int or float. Is there a way to do that?

在我的场景中，除少数特定列之外的所有列都将被读取为字符串。因此dtype_dic，我不想将几列定义为 str in ，而是将我选择的几列设置为 int 或 float。有没有办法做到这一点？

It's a loop cycling through various CSVs with differing columns, so a direct column conversion after having read the whole csv as string (dtype=str), would not be easy as I would not immediately know which columns that csv is having. (I'd rather spend that effort in defining all the columns in the dtype json!)

这是一个循环遍历具有不同列的各种 CSV 的循环，因此在将整个 csv 读取为字符串 ( dtype=str)后进行直接列转换并不容易，因为我不会立即知道 csv 具有哪些列。（我宁愿花费精力来定义 dtype json 中的所有列！）

Edit: But if there's a way to process the list of column names to be converted to number without erroring out if that column isn't present in that csv, then yes that'll be a valid solution, if there's no other way to do this at csv reading stage itself.

编辑：但是如果有一种方法可以处理要转换为数字的列名列表而不会出错，如果该列不存在于该 csv 中，那么是的，如果没有其他方法，那将是一个有效的解决方案这在 csv 阅读阶段本身。

Note: this sounds like a previously asked questionbut the answers there went down a very different path (bool related) which doesn't apply to this question. Pls don't mark as duplicate!

注意：这听起来像以前问过的问题，但那里的答案走上了一条非常不同的路径（与布尔相关），不适用于此问题。请不要标记为重复！

Answer 1

回答by Nathan

EDIT - sorry, I misread your question. Updated my answer.

编辑 - 抱歉，我误读了您的问题。更新了我的答案。

You can read the entire csv as strings then convert your desired columns to other types afterwards like this:

您可以将整个 csv 读取为字符串，然后将所需的列转换为其他类型，如下所示：

df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
    df[col] = df[col].astype(col_type)

Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings

另一种方法，如果你真的想在读入文件时为所有列指定正确的类型并且在之后不更改它们：只读入列名（没有行），然后使用那些来填充哪些列应该是字符串

col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)

Python Pandas read_csv dtype 读取所有列，但很少读取字符串

提问by Nikhil VJ

回答by Nathan

相关推荐

最近更新

标签

Python Pandas read_csv dtype 读取所有列，但很少读取字符串

提问by Nikhil VJ

回答by Nathan

相关推荐

Python Django 得到了一个意外的关键字参数“id”

在 Python 中计算雅可比矩阵

Python 如何在 Windows 上运行 celery？

Python 同一图（seaborn）上 Pandas 数据框多列的箱线图

相关推荐

最近更新

标签