Python 如何在更改列名称的同时正确读取 Pandas 中的 csv

Question

提问by gabhijit

An absolute basic read_csv question.

一个绝对基本的 read_csv 问题。

I have data that looks like the following in a csv file -

我在 csv 文件中有如下所示的数据 -

Date,Open Price,High Price,Low Price,Close Price,WAP,No.of Shares,No. of Trades,Total Turnover (Rs.),Deliverable Quantity,% Deli. Qty to Traded Qty,Spread High-Low,Spread Close-Open
28-February-2015,2270.00,2310.00,2258.00,2294.85,2279.192067772602217319,73422,8043,167342840.00,11556,15.74,52.00,24.85
27-February-2015,2267.25,2280.85,2258.00,2266.35,2269.239841485775122730,50721,4938,115098114.00,12297,24.24,22.85,-0.90
26-February-2015,2314.90,2314.90,2250.00,2259.50,2277.198324862194860047,69845,8403,159050917.00,22046,31.56,64.90,-55.40
25-February-2015,2290.00,2332.00,2278.35,2318.05,2315.100614216488163214,161995,10174,375034724.00,102972,63.56,53.65,28.05
24-February-2015,2276.05,2295.00,2258.00,2278.15,2281.058946240263344242,52251,7726,119187611.00,13292,25.44,37.00,2.10
23-February-2015,2303.95,2311.00,2253.25,2270.70,2281.912259219760108491,75951,7344,173313518.00,24969,32.88,57.75,-33.25
20-February-2015,2324.00,2335.20,2277.00,2284.30,2301.631421152326354478,79717,10233,183479152.00,23045,28.91,58.20,-39.70
19-February-2015,2304.00,2333.90,2292.00,2326.60,2321.485466301625211160,85835,8847,199264705.00,29728,34.63,41.90,22.60
18-February-2015,2284.00,2305.00,2261.10,2295.75,2282.060986778089405300,69884,6639,159479550.00,26665,38.16,43.90,11.75
16-February-2015,2281.00,2305.85,2266.00,2278.50,2284.961866239581019628,85541,10149,195457923.00,22164,25.91,39.85,-2.50
13-February-2015,2311.00,2324.90,2286.95,2296.40,2311.371235111317676864,109731,5570,253629077.00,69039,62.92,37.95,-14.60
12-February-2015,2280.00,2322.85,2275.00,2315.45,2301.372038211769425569,79766,9095,183571242.00,33981,42.60,47.85,35.45
11-February-2015,2275.00,2295.00,2258.25,2287.20,2279.587966250020639664,60563,7467,138058686.00,20058,33.12,36.75,12.20
10-February-2015,2244.90,2297.40,2225.00,2280.30,2269.562228214830293104,141656,13026,321497107.00,55577,39.23,72.40,35.40

--

——

I am trying to read this data in a pandas dataframe using the following variations of read_csv. I am only interested in two columns.

我正在尝试使用 read_csv 的以下变体在 Pandas 数据框中读取这些数据。我只对两列感兴趣。

z = pd.read_csv('file.csv', parse_dates=True, index_col="Date", usecols=["Date", "Open Price", "Close Price"], names=["Date", "O", "C"], header=0)

What I get is

我得到的是

     O    C

Date                
2015-02-28  NaN  NaN
2015-02-27  NaN  NaN
2015-02-26  NaN  NaN
2015-02-25  NaN  NaN
2015-02-24  NaN  NaN

Or 
z = pd.read_csv('file.csv', parse_dates=True, index_col="Date", usecols=["Date", "Open", "Close"], names=["Date", "Open Price", "Close Price"], header=0)

The result is -

结果是——

    Open Price Close Price
Date                             
2015-02-28        NaN         NaN
2015-02-27        NaN         NaN
2015-02-26        NaN         NaN
2015-02-25        NaN         NaN

Am I missing something fundamental or is there an issue with read_csv of pandas 0.13.1- my version on Debian Wheezy?

我是否遗漏了一些基本的东西，或者熊猫的 read_csv 有问题0.13.1——我在 Debian Wheezy 上的版本？

Answer 1

采纳答案by Papouche Guinslyzinho

You are right, something is odd with the nameattributes. Seems to me that you can not use both in the same time. Either you set the name for every columns of the CSV file or you don't set the name at all. So it seems that you can't set the name when you are not taking all the colums (usecols)

你是对的，name属性有些奇怪。在我看来，您不能同时使用两者。要么为 CSV 文件的每一列设置名称，要么根本不设置名称。所以好像没取所有的列的时候是不能设置名字的( usecols)

names : array-like List of column names to use. If file contains no header row, then you should explicitly pass header=None

You might already know it but you can rename the colums after also.

您可能已经知道它，但您也可以在之后重命名列。

import pandas as pd
from StringIO import StringIO

csv = r"""Date,Open Price,High Price,Low Price,Close Price,WAP,No.of Shares,No. of Trades,Total Turnover (Rs.),Deliverable Quantity,% Deli. Qty to Traded Qty,Spread High-Low,Spread Close-Open
28-February-2015,2270.00,2310.00,2258.00,2294.85,2279.192067772602217319,73422,8043,167342840.00,11556,15.74,52.00,24.85
27-February-2015,2267.25,2280.85,2258.00,2266.35,2269.239841485775122730,50721,4938,115098114.00,12297,24.24,22.85,-0.90
26-February-2015,2314.90,2314.90,2250.00,2259.50,2277.198324862194860047,69845,8403,159050917.00,22046,31.56,64.90,-55.40
25-February-2015,2290.00,2332.00,2278.35,2318.05,2315.100614216488163214,161995,10174,375034724.00,102972,63.56,53.65,28.05
24-February-2015,2276.05,2295.00,2258.00,2278.15,2281.058946240263344242,52251,7726,119187611.00,13292,25.44,37.00,2.10
23-February-2015,2303.95,2311.00,2253.25,2270.70,2281.912259219760108491,75951,7344,173313518.00,24969,32.88,57.75,-33.25
20-February-2015,2324.00,2335.20,2277.00,2284.30,2301.631421152326354478,79717,10233,183479152.00,23045,28.91,58.20,-39.70
19-February-2015,2304.00,2333.90,2292.00,2326.60,2321.485466301625211160,85835,8847,199264705.00,29728,34.63,41.90,22.60
18-February-2015,2284.00,2305.00,2261.10,2295.75,2282.060986778089405300,69884,6639,159479550.00,26665,38.16,43.90,11.75
16-February-2015,2281.00,2305.85,2266.00,2278.50,2284.961866239581019628,85541,10149,195457923.00,22164,25.91,39.85,-2.50
13-February-2015,2311.00,2324.90,2286.95,2296.40,2311.371235111317676864,109731,5570,253629077.00,69039,62.92,37.95,-14.60
12-February-2015,2280.00,2322.85,2275.00,2315.45,2301.372038211769425569,79766,9095,183571242.00,33981,42.60,47.85,35.45
    11-February-2015,2275.00,2295.00,2258.25,2287.20,2279.587966250020639664,60563,7467,138058686.00,20058,33.12,36.75,12.20
    10-February-2015,2244.90,2297.40,2225.00,2280.30,2269.562228214830293104,141656,13026,321497107.00,55577,39.23,72.40,35.40"""

df = pd.read_csv(StringIO(csv), 
        usecols=["Date", "Open Price", "Close Price"],
        header=0)

df.columns = ['Date', 'O', 'C']

df

output:

输出：

                Date        O        C
0   28-February-2015  2270.00  2294.85
1   27-February-2015  2267.25  2266.35
2   26-February-2015  2314.90  2259.50
3   25-February-2015  2290.00  2318.05
4   24-February-2015  2276.05  2278.15
5   23-February-2015  2303.95  2270.70
6   20-February-2015  2324.00  2284.30
7   19-February-2015  2304.00  2326.60
8   18-February-2015  2284.00  2295.75
9   16-February-2015  2281.00  2278.50
10  13-February-2015  2311.00  2296.40
11  12-February-2015  2280.00  2315.45
12  11-February-2015  2275.00  2287.20
13  10-February-2015  2244.90  2280.30

Answer 2

回答by vldbnc

According to documentation your usecolslist should be subset of new nameslist

根据文档，您的usecols列表应该是新名称列表的子集

usecols : list-like or callable, default None
Return a subset of the columns. If list-like, all elements must either
be positional (i.e. integer indices into the document columns) or strings
that correspond to column names provided either by the user in `names` or
inferred from the document header row(s).

Example of csv

csv示例

"OLD1", "OLD2", "OLD3"
1,2,3
4,5,6

Code for renaming OLDX -> NEWX and using only NEW2 + NEW3

重命名 OLDX -> NEWX 并仅使用 NEW2 + NEW3 的代码

import pandas as pd
d = pd.read_csv('test.csv', header=0, names=['NEW1', 'NEW2', 'NEW3'], usecols=['NEW2', 'NEW3'])

Output

输出

   NEW2  NEW3
0     2     3
1     5     6

NOTE:Even if above is working as expected there is an issue while changing engine='python'

注意：即使上面按预期工作，更改时也会出现问题engine='python'

d = pd.read_csv('test.csv', header=0, engine='python',
                names=['NEW1', 'NEW2', 'NEW3'], usecols=['NEW2', 'NEW3'])

ValueError: Number of passed names did not match number of header fields in the file

Workaround is set header=Noneand skiprows=[0,]:

解决方法已设置，header=None并且skiprows=[0,]：

d = pd.read_csv('test.csv', header=None, skiprows=[0,], engine='python', names=['NEW1', 'NEW2', 'NEW3'], usecols=['NEW2', 'NEW3'])

Output

输出

   NEW2  NEW3
0     2     3
1     5     6

Pandas version: 0.23.4

熊猫版本：0.23.4

Python 如何在更改列名称的同时正确读取 Pandas 中的 csv

提问by gabhijit

I have data that looks like the following in a csv file -

我在 csv 文件中有如下所示的数据 -

采纳答案by Papouche Guinslyzinho

回答by vldbnc

相关推荐

最近更新

标签

Python 如何在更改列名称的同时正确读取 Pandas 中的 csv

提问by gabhijit

I have data that looks like the following in a csv file -

我在 csv 文件中有如下所示的数据 -

采纳答案by Papouche Guinslyzinho

回答by vldbnc

相关推荐

Python urllib.request 模块无法在我的系统中安装

在 Python 脚本中将 freeze_support() 放在哪里？

如何在 Python 中应用分段线性拟合？

如何在Python中找到当前日期是工作日还是周末？

相关推荐

最近更新

标签