Python 将列标题添加到 Pandas 数据框......但即使标题是相同的维度,NAN 也是所有数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34659105/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:19:37  来源:igfitidea点击:

Adding Column headers to pandas dataframe.. but NAN's all the data even though headers are same dimension

pythoncsvpandas

提问by noidea

I am trying to add column headers to csv file that I have parsed into a dataframe withing Pandas.

我正在尝试将列标题添加到 csv 文件中,我已经使用 Pandas 将其解析为数据框。

dfTrades = pd.read_csv('pnl1.txt',delim_whitespace=True,header=None,);
dfTrades = dfTrades.drop(dfTrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1)     # Note: zero indexed
dfTrades = dfTrades.set_index([dfTrades.index]);
df = pd.DataFrame(dfTrades,columns=['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']);


print df

The structure of the data is:

数据结构如下:

01/10/2015 05:47.3  190 190 -648 838 EURNOK -648 0  0 611   -1137   -648 H 2     -1000000   9.465   -648    INTERNAL    IOC 287 AS

What Pandas returns is:

Pandas 返回的是:

  TradeDate  TradeTime  CumPnL  DailyCumPnL  RealisedPnL  UnRealisedPnL  \
0            NaN        NaN     NaN          NaN          NaN            NaN   ...

I would appreciate any advice on the issue.

我将不胜感激关于这个问题的任何建议。

Thanks

谢谢

Ps. Thanks to Ed for his answer. I have tried your suggestion with

附言。感谢 Ed 的回答。我已经尝试过你的建议

df = dfTrades.columns=['TradeDate',
                   'TradeTime',
                   'CumPnL',
                   'DailyCumPnL',
                   'RealisedPnL',
                   'UnRealisedPnL',
                   'CCYCCY',
                   'CCYCCYPnLDaily',
                   'Position',
                   'CandleOpen',
                   'CandleHigh',
                   'CandleLow',
                   'CandleClose',
                   'CandleDir',
                   'CandleDirSwings',
                   'TradeAmount',
                   'Rate',
                   'PnL/Trade',
                   'Venue',
                   'OrderType',
                   'OrderID'
                   'Code'];

But now the problem has morphed to:

但是现在问题变成了:

 ValueError: Length mismatch: Expected axis has 22 elements, new values have     21 elements

I have taken the shape of the matrix and got: dfTrades.shape

我已经采用了矩阵的形状并得到了:dfTrades.shape

(12056, 22)

So sadly i still need some help :(

很遗憾,我仍然需要一些帮助:(

采纳答案by EdChum

Assign directly to the columns:

直接分配给列:

df.columns = ['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']

What you're doing is reindexing and because the columns don't agree get all NaNs as you're passing the df as the data it will align on existing column names and index values.

您正在做的是重新索引,因为列不同意NaN在您将 df 作为数据传递时获取所有s,它将与现有列名和索引值对齐。

You can see the same semantic behaviour here:

您可以在此处看到相同的语义行为:

In [240]:
df = pd.DataFrame(data= np.random.randn(5,3), columns = np.arange(3))
df

Out[240]:
          0         1         2
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706

In [242]:
df1 = pd.DataFrame(df, columns = list('abc'))
df1

Out[242]:
    a   b   c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

Alternatively you can pass the np array as the data:

或者,您可以将 np 数组作为数据传递:

df = pd.DataFrame(dfTrades.values,columns=['TradeDate',

In [244]:
df1 = pd.DataFrame(df.values, columns = list('abc'))
df1

Out[244]:
          a         b         c
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706

回答by Farshid

You can try this way: You can use names directly in the read_csv

你可以试试这样:你可以直接在 read_csv

names : array-like, default None List of column names to use. If the file contains no header row, then you should explicitly pass header=None

名称:类似数组,默认无要使用的列名列表。如果文件不包含标题行,则应明确传递 header=None

Cov = pd.read_csv("path/to/file.txt", sep='\t', 
                  names = ["Sequence", "Start", "End", "Coverage"])
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])

thisanswer.

这个答案。

回答by Jaynab

you need to do dfTrades.valuesinstead of dfTradeswhen passing to pandas pd.DataFrame.

你需要做dfTrades.values而不是dfTrades传递给 pandas pd.DataFrame

column_names= ['TradeDate',
               'TradeTime',
               'CumPnL',
               'DailyCumPnL',
               'RealisedPnL',
               'UnRealisedPnL',
               'CCYCCY',
               'CCYCCYPnLDaily',
               'Position',
               'CandleOpen',
               'CandleHigh',
               'CandleLow',
               'CandleClose',
               'CandleDir',
               'CandleDirSwings',
               'TradeAmount',
               'Rate',
               'PnL/Trade',
               'Venue',
               'OrderType',
               'OrderID'
               'Code']


df1 = pd.DataFrame(dfTrades.values, columns = column_names )

df1.head()