Python 将列标题添加到 Pandas 数据框......但即使标题是相同的维度，NAN 也是所有数据

Question

提问by noidea

I am trying to add column headers to csv file that I have parsed into a dataframe withing Pandas.

我正在尝试将列标题添加到 csv 文件中，我已经使用 Pandas 将其解析为数据框。

dfTrades = pd.read_csv('pnl1.txt',delim_whitespace=True,header=None,);
dfTrades = dfTrades.drop(dfTrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1)     # Note: zero indexed
dfTrades = dfTrades.set_index([dfTrades.index]);
df = pd.DataFrame(dfTrades,columns=['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']);


print df

The structure of the data is:

数据结构如下：

01/10/2015 05:47.3  190 190 -648 838 EURNOK -648 0  0 611   -1137   -648 H 2     -1000000   9.465   -648    INTERNAL    IOC 287 AS

What Pandas returns is:

Pandas 返回的是：

  TradeDate  TradeTime  CumPnL  DailyCumPnL  RealisedPnL  UnRealisedPnL  \
0            NaN        NaN     NaN          NaN          NaN            NaN   ...

I would appreciate any advice on the issue.

我将不胜感激关于这个问题的任何建议。

Thanks

谢谢

Ps. Thanks to Ed for his answer. I have tried your suggestion with

附言。感谢 Ed 的回答。我已经尝试过你的建议

df = dfTrades.columns=['TradeDate',
                   'TradeTime',
                   'CumPnL',
                   'DailyCumPnL',
                   'RealisedPnL',
                   'UnRealisedPnL',
                   'CCYCCY',
                   'CCYCCYPnLDaily',
                   'Position',
                   'CandleOpen',
                   'CandleHigh',
                   'CandleLow',
                   'CandleClose',
                   'CandleDir',
                   'CandleDirSwings',
                   'TradeAmount',
                   'Rate',
                   'PnL/Trade',
                   'Venue',
                   'OrderType',
                   'OrderID'
                   'Code'];

But now the problem has morphed to:

但是现在问题变成了：

 ValueError: Length mismatch: Expected axis has 22 elements, new values have     21 elements

I have taken the shape of the matrix and got: dfTrades.shape

我已经采用了矩阵的形状并得到了：dfTrades.shape

(12056, 22)

So sadly i still need some help :(

很遗憾，我仍然需要一些帮助:(

Answer 1

采纳答案by EdChum

Assign directly to the columns:

直接分配给列：

df.columns = ['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']

What you're doing is reindexing and because the columns don't agree get all NaNs as you're passing the df as the data it will align on existing column names and index values.

您正在做的是重新索引，因为列不同意NaN在您将 df 作为数据传递时获取所有s，它将与现有列名和索引值对齐。

You can see the same semantic behaviour here:

您可以在此处看到相同的语义行为：

In [240]:
df = pd.DataFrame(data= np.random.randn(5,3), columns = np.arange(3))
df

Out[240]:
          0         1         2
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706

In [242]:
df1 = pd.DataFrame(df, columns = list('abc'))
df1

Out[242]:
    a   b   c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

Alternatively you can pass the np array as the data:

或者，您可以将 np 数组作为数据传递：

df = pd.DataFrame(dfTrades.values,columns=['TradeDate',

In [244]:
df1 = pd.DataFrame(df.values, columns = list('abc'))
df1

Out[244]:
          a         b         c
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706

Answer 2

回答by Farshid

You can try this way: You can use names directly in the read_csv

你可以试试这样：你可以直接在 read_csv

names : array-like, default None List of column names to use. If the file contains no header row, then you should explicitly pass header=None

名称：类似数组，默认无要使用的列名列表。如果文件不包含标题行，则应明确传递 header=None

Cov = pd.read_csv("path/to/file.txt", sep='\t', 
                  names = ["Sequence", "Start", "End", "Coverage"])
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])

thisanswer.

这个答案。

Answer 3

回答by Jaynab

you need to do dfTrades.valuesinstead of dfTradeswhen passing to pandas pd.DataFrame.

你需要做dfTrades.values而不是dfTrades传递给 pandas pd.DataFrame。

column_names= ['TradeDate',
               'TradeTime',
               'CumPnL',
               'DailyCumPnL',
               'RealisedPnL',
               'UnRealisedPnL',
               'CCYCCY',
               'CCYCCYPnLDaily',
               'Position',
               'CandleOpen',
               'CandleHigh',
               'CandleLow',
               'CandleClose',
               'CandleDir',
               'CandleDirSwings',
               'TradeAmount',
               'Rate',
               'PnL/Trade',
               'Venue',
               'OrderType',
               'OrderID'
               'Code']


df1 = pd.DataFrame(dfTrades.values, columns = column_names )

df1.head()

Python 将列标题添加到 Pandas 数据框......但即使标题是相同的维度，NAN 也是所有数据

提问by noidea

采纳答案by EdChum

回答by Farshid

回答by Jaynab

相关推荐

最近更新

标签

Python 将列标题添加到 Pandas 数据框......但即使标题是相同的维度，NAN 也是所有数据

提问by noidea

采纳答案by EdChum

回答by Farshid

回答by Jaynab

相关推荐

Python NameError: 全局名称 'myExample2' 未定义 # modules

Python pip缓存文件夹在哪里

Python DBSCAN 用于聚类地理位置数据

Python 如何在 Mac 上安装 PyQt5？

相关推荐

最近更新

标签