Python 在 Pandas 数据框中分配新列标签时出现长度不匹配错误

Question

提问by Thomas Matthew

The tab file I'm working with is missing the final column name. When I attempt to repair the header by appending the missing value, I get a mismatch error. Here's an example to illustrate the problem:

我正在使用的选项卡文件缺少最后的列名。当我尝试通过附加缺失值来修复标头时，出现不匹配错误。这里有一个例子来说明这个问题：

toy example

玩具示例

There should be a '' as the last element of the first list:

应该有一个 '' 作为第一个列表的最后一个元素：

missingcol = [[gene, cell_1, '', cell_2]
               [MYC, 5.0, P, 4.0, A]
               [AKT, 3.0, A, 1.0, P]]

To fix this, I read the first line, appended a '', loaded missingcolinto a pandas dataframe with header=Noneand skipping the first row, and redefined the column names with the modified header, like so:

为了解决这个问题，我阅读了第一行，附加了 a ''，加载missingcol到 Pandas 数据框中header=None并跳过第一行，并使用修改后的标题重新定义列名，如下所示：

fullheader = missingcol[0].append('')
fullheader = missingcol[0]

missingcol_dropheader = missingcol[1:]

df = pd.DataFrame(missingcol_dropheader, columns=fullheader)
df

Which gives me the error:

这给了我错误：

AssertionError: 4 columns passed, passed data had 5 columns

Last I checked, the new fullheaderdoes, in fact, have 5 elements to match the five elements in the data frame. What is causing this continued mismatch and how do I fix it?

最后我检查过，fullheader事实上，新的确实有 5 个元素来匹配数据框中的五个元素。 是什么导致这种持续不匹配，我该如何解决？

real example

真实的例子

I get a similar error when I repeat these same steps, but when using read_csvmethod with my actual test case. I ignore the header at line 0, and the three blank lines from lines 1-3, and drop an unwanted first column, but otherwise it's similar:

当我重复这些相同的步骤时，我得到了一个类似的错误，但是在read_csv我的实际测试用例中使用方法时。我忽略第 0 行的标题和第 1-3 行的三个空行，并删除不需要的第一列，但除此之外类似：

with open('CCLE_Expression_Entrez_2012-10-18.res', 'r') as f:
    header = f.readline().strip().split('\t')
header.append('') # missing empty colname over last A/P col

rnadf = pd.read_csv('CCLE_Expression_Entrez_2012-10-18.res', delimiter='\t', index_col=0, header=None, skiprows=[0,1,2,3])  
rnadf.columns = header
rnadf.drop([], axis=1, inplace=True)
rnadf.columns = header

ValueError: Length mismatch: Expected axis has 2073 elements, new values have 2074 elements

Very similar error to test case. What makes this error different to the test case and how do I fix it?

与测试用例非常相似的错误。 是什么使此错误与测试用例不同，我该如何解决？

Answer 1

回答by Thomas Matthew

The problem was the argument index_col=0was beginning column indexing at the gene names:

问题在于参数index_col=0是在基因名称处开始列索引：

The above dataframe ended at 2073, which with 1-based indexing with the above argument, was 2073 elements: one element fewer than my repaired header. This generated the following error:

上面的数据帧以 2073 结束，使用上述参数进行基于 1 的索引，是 2073 个元素：比我修复的标题少一个元素。这产生了以下错误：

ValueError: Length mismatch: Expected axis has 2073 elements, new values have 2074 elements

While the same read_csvcommand with index_col=Noneassigned a separate numerical index, putting the (in this case gene names) back into the dataframe from being just labels:

虽然分配了单独的数字索引的相同read_csv命令index_col=None，将（在这种情况下基因名称）从只是标签放回数据帧：

The above dataframe ended at the column number 2073, which is 2074 elements with zero-based indexing: the same length as my repaired header! Problem solved:

上面的数据帧在列号 2073 处结束，这是 2074 个元素，从零开始索引：与我修复的标题长度相同！问题解决了：

Python 在 Pandas 数据框中分配新列标签时出现长度不匹配错误

提问by Thomas Matthew

回答by Thomas Matthew

相关推荐

最近更新

标签

Python 在 Pandas 数据框中分配新列标签时出现长度不匹配错误

提问by Thomas Matthew

回答by Thomas Matthew

相关推荐

Python 如何将 tkinter 窗口设置为恒定大小

Python 错误：(-215) ssize.width > 0 && ssize.height > 0 in function resize

Python DJANGO - 使用数据从 POST 重定向到不同的页面

python 2.7 functools_lru_cache 虽然安装了但不导入

相关推荐

最近更新

标签