Python 将列标题添加到 Pandas 数据框......但即使标题是相同的维度,NAN 也是所有数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34659105/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Adding Column headers to pandas dataframe.. but NAN's all the data even though headers are same dimension
提问by noidea
I am trying to add column headers to csv file that I have parsed into a dataframe withing Pandas.
我正在尝试将列标题添加到 csv 文件中,我已经使用 Pandas 将其解析为数据框。
dfTrades = pd.read_csv('pnl1.txt',delim_whitespace=True,header=None,);
dfTrades = dfTrades.drop(dfTrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1) # Note: zero indexed
dfTrades = dfTrades.set_index([dfTrades.index]);
df = pd.DataFrame(dfTrades,columns=['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']);
print df
The structure of the data is:
数据结构如下:
01/10/2015 05:47.3 190 190 -648 838 EURNOK -648 0 0 611 -1137 -648 H 2 -1000000 9.465 -648 INTERNAL IOC 287 AS
What Pandas returns is:
Pandas 返回的是:
TradeDate TradeTime CumPnL DailyCumPnL RealisedPnL UnRealisedPnL \
0 NaN NaN NaN NaN NaN NaN ...
I would appreciate any advice on the issue.
我将不胜感激关于这个问题的任何建议。
Thanks
谢谢
Ps. Thanks to Ed for his answer. I have tried your suggestion with
附言。感谢 Ed 的回答。我已经尝试过你的建议
df = dfTrades.columns=['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code'];
But now the problem has morphed to:
但是现在问题变成了:
ValueError: Length mismatch: Expected axis has 22 elements, new values have 21 elements
I have taken the shape of the matrix and got: dfTrades.shape
我已经采用了矩阵的形状并得到了:dfTrades.shape
(12056, 22)
So sadly i still need some help :(
很遗憾,我仍然需要一些帮助:(
采纳答案by EdChum
Assign directly to the columns:
直接分配给列:
df.columns = ['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']
What you're doing is reindexing and because the columns don't agree get all NaNs as you're passing the df as the data it will align on existing column names and index values.
您正在做的是重新索引,因为列不同意NaN在您将 df 作为数据传递时获取所有s,它将与现有列名和索引值对齐。
You can see the same semantic behaviour here:
您可以在此处看到相同的语义行为:
In [240]:
df = pd.DataFrame(data= np.random.randn(5,3), columns = np.arange(3))
df
Out[240]:
0 1 2
0 1.037216 0.761995 0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986 0.594895 -0.733236
3 0.556196 0.363965 -0.893846
4 0.547791 -0.378287 -1.171706
In [242]:
df1 = pd.DataFrame(df, columns = list('abc'))
df1
Out[242]:
a b c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
Alternatively you can pass the np array as the data:
或者,您可以将 np 数组作为数据传递:
df = pd.DataFrame(dfTrades.values,columns=['TradeDate',
In [244]:
df1 = pd.DataFrame(df.values, columns = list('abc'))
df1
Out[244]:
a b c
0 1.037216 0.761995 0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986 0.594895 -0.733236
3 0.556196 0.363965 -0.893846
4 0.547791 -0.378287 -1.171706
回答by Farshid
You can try this way:
You can use names directly in the read_csv
你可以试试这样:你可以直接在 read_csv
names : array-like, default None List of column names to use. If the file contains no header row, then you should explicitly pass header=None
名称:类似数组,默认无要使用的列名列表。如果文件不包含标题行,则应明确传递 header=None
Cov = pd.read_csv("path/to/file.txt", sep='\t',
names = ["Sequence", "Start", "End", "Coverage"])
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
thisanswer.
这个答案。
回答by Jaynab
you need to do dfTrades.valuesinstead of dfTradeswhen passing to pandas pd.DataFrame.
你需要做dfTrades.values而不是dfTrades传递给 pandas pd.DataFrame。
column_names= ['TradeDate',
'TradeTime',
'CumPnL',
'DailyCumPnL',
'RealisedPnL',
'UnRealisedPnL',
'CCYCCY',
'CCYCCYPnLDaily',
'Position',
'CandleOpen',
'CandleHigh',
'CandleLow',
'CandleClose',
'CandleDir',
'CandleDirSwings',
'TradeAmount',
'Rate',
'PnL/Trade',
'Venue',
'OrderType',
'OrderID'
'Code']
df1 = pd.DataFrame(dfTrades.values, columns = column_names )
df1.head()

