Linux 如何使用oob正确nandwrite一个nanddump的转储?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11279473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 07:10:59  来源:igfitidea点击:

How to correctly nandwrite a nanddump'ed dump with oob?

linuxembedded

提问by Alvin Wong

I am struggling on flashing a previous ROM dump of an embedded device in Linux. My previous dump contains oob data. I wrote it with nandwrite -n -N -o /dev/mtd0 backup.bin, and then take a ROM dump again.

我正在努力在 Linux 中刷新嵌入式设备的先前 ROM 转储。我以前的转储包含 oob 数据。我用 写了它nandwrite -n -N -o /dev/mtd0 backup.bin,然后再次进行 ROM 转储。

By comparing the old and new ROM dump, I see some un-explainable situation: the last 24 bytes of the oob (ecc bytes) of any empty blocks (filled with 0xFF) is ought to be 0xFF also, but those in the new ROM dump is filled with 0x00, causing later write failures.

通过比较新旧 ROM 转储,我看到了一些无法解释的情况:任何空块(填充 0xFF)的 oob(ecc 字节)的最后 24 个字节也应该是 0xFF,但新 ROM 中的那些dump 填充了 0x00,导致以后的写入失败。

oob ought to be:

oob 应该是:

FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

but for nandwrite:

但对于nandwrite

FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
FFFFFFFF FFFFFFFF 00000000 00000000
00000000 00000000 00000000 00000000

Anyone has any idea why?

任何人都知道为什么?



I added a hack in the nandwritecode, to skip writing to NAND if the content to be written is 0xFF, and it worked. So the problem exists when trying to write an empty page to the NAND?

我在nandwrite代码中添加了一个 hack,如果要写入的内容是0xFF,则跳过写入 NAND ,并且它起作用了。那么在尝试将空页写入 NAND 时会出现问题吗?



ADDED:

添加:

Now I am having this problem also when writing a bootloader image. The image isn't page-aligned so nandwritepadded it with 0xFF. But for pages with only 0xFFthe ecc bytes are still polluted by 0x00just like above. Seems that my hack doesn't totally solve my problem. Anyone can help? Perhaps it could be a bug in kernel 2.6.35?

现在我在编写引导加载程序映像时也遇到了这个问题。图像不是页面对齐的,所以nandwrite0xFF. 但是对于只有0xFFecc 字节的页面仍然0x00像上面一样受到污染。似乎我的黑客并没有完全解决我的问题。任何人都可以帮忙吗?也许它可能是内核 2.6.35 中的错误?

This is my hack:

这是我的黑客:

int i;
int needwrite=0;
for (i = 0 ; i < len ; ++i){
    if(((uint8_t*)data)[i]!=0xff){
        needwrite=1;
        break;
    }
}
if(!needwrite)
    return 0;

采纳答案by Alvin Wong

My hackis adding a check in nandwrite, that if the whole page going to be written is totally empty (i.e. full of 0xFF), the program will skipwriting it (as a flash_erasehad been done).

我的技巧是添加一个 check in nandwrite,如果要写入的整个页面完全是空的(即充满0xFF),程序将跳过写入它(就像flash_erase已经完成的那样)。

An extra benefitis that the whole process of nandwritegot faster because of skipping empty pages. Horray!

一个额外的好处是,nandwrite由于跳过空页,整个过程变得更快。天啊!



ADDED:

添加:

It turned out that my hack didn't actually solve the problem...

原来我的hack并没有真正解决问题......



ADDED again: (real solution)

再次添加:(真正的解决方案)

The problem is in fact the PXA310 fills the hardware ECC bits with 0x00for a blank page, so if the software writes an empty page, the bits gets 0x00. This is strange, because I should have already disabled ECC in the arguments of nandwrite. Luckily skipping writing empty pages works in preventing problems with re-writing a ROM dump.

问题实际上是 PXA310 用0x00空白页填充硬件 ECC 位,因此如果软件写入一个空页,则位变为0x00. 这很奇怪,因为我应该已经在nandwrite. 幸运的是,跳过写空页可以防止重写 ROM 转储时出现问题。

More information can be found in my blog post.

更多信息可以在我的博客文章中找到。

A patch sent to the linux-mtd list actually mentioned about the fact.

一个发送到linux-MTD列表补丁实际上提到的事实

回答by bob

This is because it is a warningto you. It will notwork reliably.

这是因为这是对你的警告。它不会可靠地工作。

Consider a situation where you have a block (1) with an error in position 0. The "controller" of the Nand-flash device puts error correcting code to correct this error.

考虑一种情况,您有一个块(1)在位置 0 处有错误。Nand-flash 设备的“控制器”放置纠错码来纠正此错误。

You copy the data from block 1 withthe ECC BUT when you write the data to a newNand-flash device, you are cloning the data. If that new nand-flash device has an error in position 1. Then the data you write back will be wrongon the following read, because position 1 is bad. But the system will think it is right, because the ECC does not show an error in position 1

使用ECC从块 1 复制数据,但是当您将数据写入新的Nand-flash 设备时,您正在克隆数据。如果那个新的NAND闪存设备在位置1有错误。那么您写回的数据在接下来的读取中将是错误的,因为位置1是错误的。但是系统会认为是对的,因为ECC没有显示位置1的错误

You cannot reliably clone 1 nand-flash to another directly, because the hard/soft error positions are not identical.

您不能可靠地将 1 个 nand-flash 直接克隆到另一个,因为硬/软错误位置不相同。

The onlyway to do it reliably is to read the data out, use the systems ECC algorithms to correct any errors. Write the data out to a new device, use the systems algorithms to correct any bit errors.

可靠地做到这一点的唯一方法是读出数据,使用系统 ECC 算法来纠正任何错误。将数据写入新设备,使用系统算法纠正任何位错误。

You may think the devices are the same, but the results are data/program corruption due to mismatches in the bit error maps.

您可能认为设备是相同的,但结果是由于位错误映射中的不匹配导致数据/程序损坏。

In response to Alvin's comment:

回应阿尔文的评论:

I am quite confident that I am cloning the exact same NAND, i.e. I made a backup of that particular chip and then write it back to THAT particular chip. It's not me who think it's the same, but there is only one single chip from the beginning to the end. It is quite strange but some other people state that it worked on their own device, while mine doesn't, could there be a bug in the driver? – Alvin Wong Aug 5 at 5:16

我非常有信心我正在克隆完全相同的 NAND,即我备份了那个特定的芯片,然后将它写回那个特定的芯片。不是我觉得一样,而是从头到尾只有一个芯片。这很奇怪,但其他一些人说它可以在他们自己的设备上运行,而我的却不能,驱动程序中是否有错误?– Alvin Wong 8 月 5 日 5:16

Sorry notpossible (unless you are really..really..really lucky and get chips with 0 defects)

对不起没有可能(除非你是幸运really..really..really并获得筹码0缺陷)

Each Nand-Flash chip has its own set of defect bits, they are Unique. The way that a user gets round it, is to generate a file system that masks out the bad blocks once the bad bits gets beyond the capability of the CRC. When you copy a nand-chip to another device, the CRC map matches the master chip. when you do a 1:1 clone of the device, some of the data bits will flip after the write (bad cells) and since you are doing a clone, you do not take into account in the CRC that these bits have flipped (because you are doing a verbatim copy).

每个 Nand-Flash 芯片都有自己的一组缺陷位,它们是唯一的。用户绕过它的方法是生成一个文件系统,一旦坏位超出 CRC 的能力,该系统就会屏蔽掉坏块。当您将一个非芯片复制到另一个设备时,CRC 映射与主芯片匹配。当您对设备进行 1:1 克隆时,某些数据位将在写入后翻转(坏单元),并且由于您在进行克隆,因此您不会在 CRC 中考虑这些位已翻转(因为您正在逐字复制)。

The fact that it "works" for some people, does not mean it is correct, any more than I can drive a car, but I only find the brakes don't work when I need them. Even worse is the fact that many of these so called 'experts' on the net actually erase the defect map supplied by the manufacturer when they "clone" the device" or perform a "chip erase" beforesaving the defect map.

它对某些人“有效”的事实并不意味着它是正确的,就像我可以开车一样,但我只在需要时才发现刹车不起作用。更糟糕的是,许多网络上所谓的“专家”在“克隆”设备或保存缺陷图之前执行“芯片擦除”时实际上擦除了制造商提供的缺陷图。

This is what happens with many of the 'dodgy' nand-flash usb sticks coming out of ebay, they are actually chips with the "defect map" erased , as a result they look like good devices, until you try to save content to them.

这就是从 ebay 出来的许多“狡猾的”nand-flash usb 棒的情况,它们实际上是擦除了“缺陷图”的芯片,因此它们看起来像好的设备,直到您尝试将内容保存到它们.

回答by Robert Calhoun

In my embedded world, you'd first use flash_eraseto blast everything followed by nandwrite -pto pad the rest of the page beyond your data with 0xFF.

在我的嵌入式世界中,您首先使用flash_erase爆炸所有内容,然后nandwrite -p用 0xFF 填充数据之外的页面其余部分。

Usage: nandwrite [OPTION] MTD_DEVICE [INPUTFILE|-]
Writes to the specified MTD device.

  -m, --markbad           Mark blocks bad if write fails
  -N, --noskipbad         Write without bad block skipping
  -o, --oob               Image contains oob data
  -O, --onlyoob           Image contains oob data and only write the oob part
  -r, --raw               Image contains the raw oob data dumped by nanddump
  -s addr, --start=addr   Set start address (default is 0)
  -p, --pad               Pad to page size
  -b, --blockalign=1|2|4  Set multiple of eraseblocks to align to
  -q, --quiet             Don't display progress messages
      --help              Display this help and exit
      --version           Output version information and exit

回答by Karl

Sorry, Alvin, but the backup really will not "only work on that particular flash", because you cannot know whena particular bit will go from good to marginal or marginal to bad. You may read it in one state, attempt to write it in the exact same state and fail, on any given day, with any given backup.

对不起,Alvin,但备份真的不会“只在那个特定的闪存上工作”,因为你不知道什么时候某个特定的位会从好到边缘或边缘到坏。您可以在一种状态下读取它,尝试在完全相同的状态下写入它,但在任何给定的日期,任何给定的备份都会失败。

The ONLYway to safely backup the data in a NAND device is WITH ECC TURNED ON. You read from the device with ECC corrections to get good data. You then write the known-good data back to NAND with ECC turned on so that any bits which are now marginal or bad from when you read it before can be corrected using the NEW ECC values.

安全备份 NAND 设备中数据的唯一方法是开启 ECC。您从带有 ECC 校正的设备中读取数据以获得良好的数据。然后,您在 ECC 打开的情况下将已知良好的数据写回 NAND,以便您可以使用新的 ECC 值更正之前读取它时现在处于边缘或坏的任何位。