Python 使用 pandas.read_csv 和索引读取 csv 文件会创建 NaN 条目
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22655438/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading a csv-file with pandas.read_csv and an index creates NaN entries
提问by user2366975
My .csv-file is comma separated, which is the standard setting from read_csv.
我的 .csv 文件是逗号分隔的,这是 read_csv 的标准设置。
This is working:
这是有效的:
T1 = pd.DataFrame(pd.read_csv(loggerfile, header = 2)) #header contains column "1"
But as soon as I add something to DataFrame's constructor besides the read_csv, all my values are suddenly NaN.
Why? How to solve this?
但是,一旦我向DataFrame除了 的构造函数添加了一些东西read_csv,我所有的值都突然变成了NaN. 为什么?如何解决这个问题?
datetimeIdx = pd.to_datetime( T1["1"] ) #timestamp-column
T2 = pd.DataFrame(pd.read_csv(loggerfile, header = 2), index = datetimeIdx)
回答by joris
It's not necessary to wrap read_csvin a DataFramecall, as it already returns a DataFrame.
没有必要read_csv在DataFrame调用中包装,因为它已经返回了DataFrame.
If you want to change the index, you can use set_indexor directly set the index:
如果要更改索引,可以使用set_index或直接设置索引:
T1 = pd.read_csv(loggerfile, header = 2)
T1.index = pd.DatetimeIndex(T1["1"])
If you want to keep the column in the dataframe as a datetime (and not string):
如果要将数据框中的列保留为日期时间(而不是字符串):
T1 = pd.read_csv(loggerfile, header = 2)
T1["1"] = pd.DatetimeIndex(T1["1"])
T2 = T1.set_index("1", drop=False)
But even better, you can do this directly in read_csv(assuming the column "1" is the first column):
但更好的是,您可以直接在read_csv(假设列“1”是第一列)中执行此操作:
pd.read_csv(loggerfile, header=2, index_col=0, parse_dates=True)
The reason it returns a DataFrame with NaNsis because the DataFrame()call with a DataFrame as input will do a reindexoperation with the provided input. As none of the labels in datetimeIdxare in the original index of T1you get a dataframe with all NaNs.
它返回一个 DataFrame 的原因NaNs是因为DataFrame()使用 DataFrame 作为输入的调用将reindex使用提供的输入执行操作。由于datetimeIdx原始索引中没有任何标签,因此T1您将获得一个包含所有 NaN 的数据框。

