Python 将列标题添加到 Pandas 数据框......但即使标题是相同的维度,NAN 也是所有数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34659105/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Adding Column headers to pandas dataframe.. but NAN's all the data even though headers are same dimension
提问by noidea
I am trying to add column headers to csv file that I have parsed into a dataframe withing Pandas.
我正在尝试将列标题添加到 csv 文件中,我已经使用 Pandas 将其解析为数据框。
dfTrades = pd.read_csv('pnl1.txt',delim_whitespace=True,header=None,);
dfTrades = dfTrades.drop(dfTrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1) # Note: zero indexed
dfTrades = dfTrades.set_index([dfTrades.index]);
df = pd.DataFrame(dfTrades,columns=['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']);
print df
The structure of the data is:
数据结构如下:
01/10/2015 05:47.3 190 190 -648 838 EURNOK -648 0 0 611 -1137 -648 H 2 -1000000 9.465 -648 INTERNAL IOC 287 AS
What Pandas returns is:
Pandas 返回的是:
TradeDate TradeTime CumPnL DailyCumPnL RealisedPnL UnRealisedPnL \
0 NaN NaN NaN NaN NaN NaN ...
I would appreciate any advice on the issue.
我将不胜感激关于这个问题的任何建议。
Thanks
谢谢
Ps. Thanks to Ed for his answer. I have tried your suggestion with
附言。感谢 Ed 的回答。我已经尝试过你的建议
df = dfTrades.columns=['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code'];
But now the problem has morphed to:
但是现在问题变成了:
ValueError: Length mismatch: Expected axis has 22 elements, new values have 21 elements
I have taken the shape of the matrix and got: dfTrades.shape
我已经采用了矩阵的形状并得到了:dfTrades.shape
(12056, 22)
So sadly i still need some help :(
很遗憾,我仍然需要一些帮助:(
采纳答案by EdChum
Assign directly to the columns:
直接分配给列:
df.columns = ['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']
What you're doing is reindexing and because the columns don't agree get all NaN
s as you're passing the df as the data it will align on existing column names and index values.
您正在做的是重新索引,因为列不同意NaN
在您将 df 作为数据传递时获取所有s,它将与现有列名和索引值对齐。
You can see the same semantic behaviour here:
您可以在此处看到相同的语义行为:
In [240]:
df = pd.DataFrame(data= np.random.randn(5,3), columns = np.arange(3))
df
Out[240]:
0 1 2
0 1.037216 0.761995 0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986 0.594895 -0.733236
3 0.556196 0.363965 -0.893846
4 0.547791 -0.378287 -1.171706
In [242]:
df1 = pd.DataFrame(df, columns = list('abc'))
df1
Out[242]:
a b c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
Alternatively you can pass the np array as the data:
或者,您可以将 np 数组作为数据传递:
df = pd.DataFrame(dfTrades.values,columns=['TradeDate',
In [244]:
df1 = pd.DataFrame(df.values, columns = list('abc'))
df1
Out[244]:
a b c
0 1.037216 0.761995 0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986 0.594895 -0.733236
3 0.556196 0.363965 -0.893846
4 0.547791 -0.378287 -1.171706
回答by Farshid
You can try this way:
You can use names directly in the read_csv
你可以试试这样:你可以直接在 read_csv
names : array-like, default None List of column names to use. If the file contains no header row, then you should explicitly pass header=None
名称:类似数组,默认无要使用的列名列表。如果文件不包含标题行,则应明确传递 header=None
Cov = pd.read_csv("path/to/file.txt", sep='\t',
names = ["Sequence", "Start", "End", "Coverage"])
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
thisanswer.
这个答案。
回答by Jaynab
you need to do dfTrades.values
instead of dfTrades
when passing to pandas pd.DataFrame
.
你需要做dfTrades.values
而不是dfTrades
传递给 pandas pd.DataFrame
。
column_names= ['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']
df1 = pd.DataFrame(dfTrades.values, columns = column_names )
df1.head()