.net 使用 OpenXML 替换 word 文档中的图像
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2810138/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace image in word doc using OpenXML
提问by fearofawhackplanet
Following on from my last question here
继我在这里的最后一个问题之后
OpenXML looks like it probably does exactly what I want, but the documentation is terrible. An hour of googling hasn't got me any closer to figuring out what I need to do.
OpenXML 看起来它可能正是我想要的,但文档很糟糕。一个小时的谷歌搜索并没有让我更接近弄清楚我需要做什么。
I have a word document. I want to add an image to that word document (using word) in such a way that I can then open the document in OpenXML and replace that image. Should be simple enough, yes?
我有一个word文档。我想将图像添加到该 word 文档(使用 word),以便我可以在 OpenXML 中打开该文档并替换该图像。应该够简单了吧?
I'm assuming I should be able to give my image 'placeholder' an id of some sort and then use GetPartByIdto locate the image and replace it. Would this be the correct method? What is this Id? How do you add it using Word?
我假设我应该能够给我的图像“占位符”一个某种类型的 id,然后GetPartById用来定位图像并替换它。这是正确的方法吗?这个ID是什么?怎么用word添加?
Every example I can find which does anything remotely similar starts by building the whole word document from scratch in ML, which really isn't a lot of use.
我能找到的每个示例都从头开始在 ML 中构建整个 word 文档,这实际上并没有多大用处。
EDIT:it occured to me that it would be easier to just replace the image in the media folder with the new image, but again can't find any indication of how to do this.
编辑:我发现用新图像替换媒体文件夹中的图像会更容易,但再次找不到任何关于如何执行此操作的指示。
回答by Adam Sheehan
Although the documentation for OpenXML isn't great, there is an excellent tool that you can use to see how existing Word documents are built. If you install the OpenXml SDK it comes with the DocumentReflector.exetool under the Open XML Format SDK\V2.0\toolsdirectory.
虽然 OpenXML 的文档不是很好,但有一个很好的工具可以用来查看现有 Word 文档是如何构建的。如果您安装 OpenXml SDK,它会在Open XML Format SDK\V2.0\tools目录下随附DocumentReflector.exe工具。
Images in Word documents consist of the image data and an ID that is assigned to it that is referenced in the body of the document. It seems like your problem can be broken down into two parts: finding the ID of the imagein the document, and then re-writing the image datafor it.
Word 文档中的图像由图像数据和分配给它的 ID 组成,该 ID 在文档正文中引用。看来你的问题可以分为两部分:在文档中找到图像的ID,然后为其重新写入图像数据。
To find the ID of the image, you'll need to parse the MainDocumentPart. Images are stored in Runs as a Drawing element
要查找图像的 ID,您需要解析 MainDocumentPart。图像作为绘图元素存储在运行中
<w:p>
<w:r>
<w:drawing>
<wp:inline>
<wp:extent cx="3200400" cy="704850" /> <!-- describes the size of the image -->
<wp:docPr id="2" name="Picture 1" descr="filename.JPG" />
<a:graphic>
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic>
<pic:nvPicPr>
<pic:cNvPr id="0" name="filename.JPG" />
<pic:cNvPicPr />
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId5" /> <!-- this is the ID you need to find -->
<a:stretch>
<a:fillRect />
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:ext cx="3200400" cy="704850" />
</a:xfrm>
<a:prstGeom prst="rect" />
</pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
</w:r>
</w:p>
In the above example, you need to find the ID of the image stored in the blip element. How you go about finding that is dependent on your problem, but if you know the filename of the original image you can look at the docPr element:
在上面的示例中,您需要找到存储在 blip 元素中的图像的 ID。如何查找取决于您的问题,但如果您知道原始图像的文件名,则可以查看 docPr 元素:
using (WordprocessingDocument document = WordprocessingDocument.Open("docfilename.docx", true)) {
// go through the document and pull out the inline image elements
IEnumerable<Inline> imageElements = from run in Document.MainDocumentPart.Document.Descendants<Run>()
where run.Descendants<Inline>().First() != null
select run.Descendants<Inline>().First();
// select the image that has the correct filename (chooses the first if there are many)
Inline selectedImage = (from image in imageElements
where (image.DocProperties != null &&
image.DocProperties.Equals("image filename"))
select image).First();
// get the ID from the inline element
string imageId = "default value";
Blip blipElement = selectedImage.Descendants<Blip>().First();
if (blipElement != null) {
imageId = blipElement.Embed.Value;
}
}
Then when you have the image ID, you can use that to rewrite the image data. I think this is how you would do it:
然后,当您拥有图像 ID 时,您可以使用它来重写图像数据。我认为你会这样做:
ImagePart imagePart = (ImagePart)document.MainDocumentPart.GetPartById(imageId);
byte[] imageBytes = File.ReadAllBytes("new_image.jpg");
BinaryWriter writer = new BinaryWriter(imagePart.GetStream());
writer.Write(imageBytes);
writer.Close();
回答by fearofawhackplanet
I'd like to update this thread and add to Adam's answer above for the benefit of others.
为了他人的利益,我想更新这个线程并添加到上面亚当的答案中。
I actually managed to hack some working code together the other day, (before Adam posted his answer) but it was pretty difficult. The documentation really is poor and there isn't a lot of info out there.
前几天我实际上设法一起破解了一些工作代码(在亚当发布他的答案之前),但这非常困难。文档确实很差,而且没有很多信息。
I didn't know about the Inlineand Runelements which Adam uses in his answer, but the trick seems to be in getting to the Descendants<>property and then you can pretty much parse any element like a normal XML mapping.
我不知道Adam 在他的回答中使用的InlineandRun元素,但诀窍似乎是获取Descendants<>属性,然后您几乎可以像普通的 XML 映射一样解析任何元素。
byte[] docBytes = File.ReadAllBytes(_myFilePath);
using (MemoryStream ms = new MemoryStream())
{
ms.Write(docBytes, 0, docBytes.Length);
using (WordprocessingDocument wpdoc = WordprocessingDocument.Open(ms, true))
{
MainDocumentPart mainPart = wpdoc.MainDocumentPart;
Document doc = mainPart.Document;
// now you can use doc.Descendants<T>()
}
}
Once you've got this it's fairly easy to search for things, although you have to work out what everything is called. For example, the <pic:nvPicPr>is Picture.NonVisualPictureProperties, etc.
一旦你有了这个,搜索东西就很容易了,尽管你必须弄清楚所有东西的名字。例如,<pic:nvPicPr>isPicture.NonVisualPictureProperties等。
As Adam correctly says, the element you need to find to replace the image is the Blipelement. But you need to find the correct blip which corresponds to the image you're trying to replace.
正如亚当所说的那样,您需要找到替换图像的Blip元素是元素。但是您需要找到与您要替换的图像对应的正确 blip。
Adam shows a way using the Inlineelement. I just dived straight in and looked for all the picture elements. I'm not sure which is the better or more robust way (I don't know how consistent the xml structure is between documents and if this cause breaking code).
Adam 展示了一种使用Inline元素的方法。我只是直接潜入并寻找所有图片元素。我不确定哪种方式更好或更健壮(我不知道文档之间的 xml 结构有多一致,以及这是否会导致代码中断)。
Blip GetBlipForPicture(string picName, Document document)
{
return document.Descendants<Picture>()
.Where(p => picName == p.NonVisualPictureProperties.NonVisualDrawingProperties.Name)
.Select(p => p.BlipFill.Blip)
.Single(); // return First or ToList or whatever here, there can be more than one
}
See Adam's XML example to make sense of the different elements here and see what I'm searching for.
请参阅 Adam 的 XML 示例以了解此处的不同元素并了解我要搜索的内容。
The blip has an ID in the Embedproperty, eg: <a:blip r:embed="rId4" cstate="print" />, what this does is map the Blip to an image in the Media folder (you can see all these folders and files if you rename you .docx to a .zip and unzip it). You can find the mapping in _rels\document.xml.rels:
blip 在Embed属性中有一个 ID ,例如:<a:blip r:embed="rId4" cstate="print" />,这样做是将 Blip 映射到 Media 文件夹中的图像(如果将 .docx 重命名为 .zip 并解压缩,则可以看到所有这些文件夹和文件)。您可以在_rels\document.xml.rels以下位置找到映射:
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png" />
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png" />
So what you need to do is add a new image, and then point this blip at the id of your newly created image:
所以你需要做的是添加一个新图像,然后将这个 blip 指向你新创建的图像的 id:
// add new ImagePart
ImagePart newImg = mainPart.AddImagePart(ImagePartType.Png);
// Put image data into the ImagePart (from a filestream)
newImg .FeedData(File.Open(_myImgPath, FileMode.Open, FileAccess.Read));
// Get the blip
Blip blip = GetBlipForPicture("MyPlaceholder.png", doc);
// Point blip at new image
blip.Embed = mainPart.GetIdOfPart(newImg);
I presume this just orphans the old image in the Media folder which isn't ideal, although maybe it's clever enough to garbage collect it so to speak. There may be a better way to do it, but I couldn't find it.
我认为这只是孤立了 Media 文件夹中的旧图像,这并不理想,尽管可以这么说,它可能足够聪明以进行垃圾收集。可能有更好的方法来做到这一点,但我找不到。
Anyway, there you have it. This thread is now the most complete documentation on how to swap an image anywhere on the web (I know this, I spent hours searching). So hopefully some people will find it useful.
无论如何,你有它。这个线程现在是关于如何在网络上的任何地方交换图像的最完整的文档(我知道这一点,我花了几个小时搜索)。所以希望有些人会发现它很有用。
回答by Daniel
I had the same fun trying to work out how to do this until I saw this thread. Excellent helpful answers guys.
在我看到这个线程之前,我一直在尝试弄清楚如何做到这一点也很有趣。优秀的有用的答案伙计们。
A simple way to select the ImagePart if you know the name of the image in the package is to check the Uri
如果您知道包中图像的名称,则选择 ImagePart 的一种简单方法是检查 Uri
ImagePart GetImagePart(WordprocessingDocument document, string imageName)
{
return document.MainDocumentPart.ImageParts
.Where(p => p.Uri.ToString().Contains(imageName)) // or EndsWith
.First();
}
You can then do
然后你可以做
var imagePart = GetImagePart(document, imageName);
var newImageBytes = GetNewImageBytes(): // however the image is generated or obtained
using(var writer = new BinaryWriter(imagePart.GetStream()))
{
writer.Write(newImageBytes);
}
回答by BillKrat
The following code will retrieve the images from the specified document (filename) and save them to a D:\TestArea folder using the internal filenames. The answers on this page helped me come up with my solution.
以下代码将从指定的文档(文件名)中检索图像,并使用内部文件名将它们保存到 D:\TestArea 文件夹中。此页面上的答案帮助我想出了我的解决方案。
Note: this solution does not help someone replace an image in a word doc, however in all of my searching in how to retrieve an image from a word doc this was the only/closest link I could find; just in case someone else is in the same boat I post my solution here.
注意:此解决方案不能帮助某人替换 word 文档中的图像,但是在我所有关于如何从 word 文档中检索图像的搜索中,这是我能找到的唯一/最接近的链接;以防万一其他人在同一条船上,我在这里发布我的解决方案。
private void ProcessImages(string filename)
{
var xpic = "";
var xr = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
using (WordprocessingDocument document = WordprocessingDocument.Open(filename, true))
{
var imageParts =
from paragraph in document.MainDocumentPart.Document.Body
from graphic in paragraph.Descendants<Graphic>()
let graphicData = graphic.Descendants<GraphicData>().FirstOrDefault()
let pic = graphicData.ElementAt(0)
let nvPicPrt = pic.ElementAt(0).FirstOrDefault()
let blip = pic.Descendants<Blip>().FirstOrDefault()
select new
{
Id = blip.GetAttribute("embed",xr).Value,
Filename = nvPicPrt.GetAttribute("name",xpic).Value
};
foreach(var image in imageParts)
{
var outputFilename = string.Format(@"d:\TestArea\{0}",image.Filename);
Debug.WriteLine(string.Format("Creating file: {0}",outputFilename));
// Get image from document
var imageData = document.MainDocumentPart.GetPartById(image.Id);
// Read image data into bytestream
var stream = imageData.GetStream();
var byteStream = new byte[stream.Length];
int length = (int)stream.Length;
stream.Read(byteStream, 0, length);
// Write bytestream to disk
using (var fileStream = new FileStream(outputFilename,FileMode.OpenOrCreate))
{
fileStream.Write(byteStream, 0, length);
}
}
}
}
回答by Ludisposed
I love this Section, because there is so many bad documentation on this subject, and after many hours of trying to make the above answers work. I came up with my own solution.
我喜欢这个部分,因为关于这个主题有很多糟糕的文档,并且经过数小时的努力使上述答案有效。我想出了我自己的解决方案。
How I give the Image a tagName:
我如何给图像一个标签名:
First I select the Image I want to replace in word and give it a name (for instance "toReplace") afterwards I loop through the Drawings select the Image with the correct tagName and write my own Image in its place.
首先,我选择要在 word 中替换的图像并为其命名(例如“toReplace”),然后循环遍历绘图选择具有正确 tagName 的图像并在其位置写入我自己的图像。
private void ReplaceImage(string tagName, string imagePath)
{
this.wordDoc = WordprocessingDocument.Open(this.stream, true);
IEnumerable<Drawing> drawings = this.wordDoc.MainDocumentPart.Document.Descendants<Drawing>().ToList();
foreach (Drawing drawing in drawings)
{
DocProperties dpr = drawing.Descendants<DocProperties>().FirstOrDefault();
if (dpr != null && dpr.Name == tagName)
{
foreach (DocumentFormat.OpenXml.Drawing.Blip b in drawing.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().ToList())
{
OpenXmlPart imagePart = wordDoc.MainDocumentPart.GetPartById(b.Embed);
using (var writer = new BinaryWriter(imagePart.GetStream()))
{
writer.Write(File.ReadAllBytes(imagePath));
}
}
}
}
}
回答by tomRedox
@Ludisposed excellent answerworked perfectly for me, but it took me a bit of digging to work out how to actually set the image name in Word in the first place. For anyone else who doesn't speak German, this is how to do it:
@Ludisposed 优秀的答案对我来说非常有用,但我花了一些时间才弄清楚如何在 Word 中实际设置图像名称。对于不会说德语的其他人,这是如何做到的:
In MS Word, click on the image then in the Home ribbon, select Select -> Selection Pane in the ribbon to show the list of images in the right hand navigation:
在 MS Word 中,单击图像,然后在主页功能区中,选择功能区中的选择 -> 选择窗格以在右侧导航中显示图像列表:
You can then click on an image's name/tag in the Selection Pane to change its name:
然后,您可以在选择窗格中单击图像的名称/标签以更改其名称:
Once you've done that you can then see how that text was incorporated into the Open XML file by using the Open XML SDK 2.5 Productivity Tool:
完成后,您可以使用 Open XML SDK 2.5 Productivity Tool 查看该文本是如何合并到 Open XML 文件中的:
Having done that I extended @Ludisposed's solution slightly into a reusable method, and tweaked the code so that passing in a null byte array would trigger the removal of the image from the document:
完成后,我将@Ludisposed 的解决方案稍微扩展为可重用的方法,并调整了代码,以便传入空字节数组会触发从文档中删除图像:
/// <summary>
/// Replaces the image in a document with the new file bytes, or removes the image if the newImageBytes parameter is null.
/// Relies on a the image having had it's name set via the 'Selection Pane' in Word
/// </summary>
/// <param name="document">The OpenXML document</param>
/// <param name="oldImagesPlaceholderText">The placeholder name for the image set via Selection in Word</param>
/// <param name="newImageBytes">The new file. Pass null to remove the selected image from the document instead</param>
public void ReplaceInternalImage(WordprocessingDocument document, string oldImagesPlaceholderText, byte[] newImageBytes)
{
var imagesToRemove = new List<Drawing>();
IEnumerable<Drawing> drawings = document.MainDocumentPart.Document.Descendants<Drawing>().ToList();
foreach (Drawing drawing in drawings)
{
DocProperties dpr = drawing.Descendants<DocProperties>().FirstOrDefault();
if (dpr != null && dpr.Name == oldImagesPlaceholderText)
{
foreach (Blip b in drawing.Descendants<Blip>().ToList())
{
OpenXmlPart imagePart = document.MainDocumentPart.GetPartById(b.Embed);
if (newImageBytes == null)
{
imagesToRemove.Add(drawing);
}
else
{
using (var writer = new BinaryWriter(imagePart.GetStream()))
{
writer.Write(newImageBytes);
}
}
}
}
foreach (var image in imagesToRemove)
{
image.Remove();
}
}
}
回答by barsmaga
in order to get images and copy them to a folder, you can use more simple method
为了获取图像并将它们复制到文件夹,您可以使用更简单的方法
System.Collections.Generic.IEnumerable<ImagePart> imageParts = doc.MainDocumentPart.ImageParts;
foreach (ImagePart img in imageParts)
{
var uri = img.Uri;
var fileName = uri.ToString().Split('/').Last();
var fileWordMedia = img.GetStream(FileMode.Open);
string imgPath = mediaPath + fileName;//mediaPath it is folder
FileStream fileHtmlMedia = new FileStream(imgPath, FileMode.Create);
int i = 0;
while (i != (-1))
{
i = fileWordMedia.ReadByte();
if (i != (-1))
{
fileHtmlMedia.WriteByte((byte)i);
}
}
fileHtmlMedia.Close();
fileWordMedia.Close();
}
回答by barsmaga
openXml documentation is very skinny and most of them deal takes too much time. I was doing a specific task and want to share the solution. I hope it will help people and they save your time. I had to get a picture of a particular place in the text, particularly if it is an object of Run.
openXml 文档非常简陋,而且大多数文档都需要花费太多时间。我正在做一个特定的任务,想分享解决方案。我希望它会帮助人们并节省您的时间。我必须得到文本中特定位置的图片,特别是如果它是 Run 的对象。
static string RunToHTML(Run r)
{
string exit = "";
OpenXmlElementList list = r.ChildElements;
foreach (OpenXmlElement element in list)
{
if (element is DocumentFormat.OpenXml.Wordprocessing.Picture)
{
exit += AddPictureToHtml((DocumentFormat.OpenXml.Wordprocessing.Picture)element);
return exit;
}
}
More specifically, I need to translate the paragraph of the document in html format.
更具体地说,我需要以 html 格式翻译文档的段落。
static string AddPictureToHtml(DocumentFormat.OpenXml.Wordprocessing.Picture pic)
{
string exit = "";
DocumentFormat.OpenXml.Vml.Shape shape = pic.Descendants<DocumentFormat.OpenXml.Vml.Shape>().First();
DocumentFormat.OpenXml.Vml.ImageData imageData = shape.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().First();
//style image
string style = shape.Style;
style = style.Replace("width:", "");
style = style.Replace("height:", "");
style = style.Replace('.', ',');
style = style.Replace("pt", "");
string[] arr = style.Split(';');
float styleW = float.Parse(arr[0]);//width picture
float styleH = float.Parse(arr[1]);//height picture
string relationId = imageData.RelationshipId;
var img = doc.MainDocumentPart.GetPartById(relationId);
var uri = img.Uri;//path in file
var fileName = uri.ToString().Split('/').Last();//name picture
var fileWordMedia = img.GetStream(FileMode.Open);
exit = String.Format("<img src=\"" + docPath+uri+ "\" width=\""+styleW+"\" heigth=\""+styleH+"\" > ");
return exit;
}
uri it is a path to picture in .docx file , for example : "test.docx/media/image.bmp" using this imformation picture so you can get picture
uri 它是 .docx 文件中图片的路径,例如:“test.docx/media/image.bmp”使用此信息图片,以便您可以获取图片
static void SavePictures(ImagePart img, string savePath)
{
var uri = img.Uri;
var fileName = uri.ToString().Split('/').Last();
var fileWordMedia = img.GetStream(FileMode.Open);
string imgPath = savePath + fileName;
FileStream fileHtmlMedia = new FileStream(imgPath, FileMode.Create);
int i = 0;
while (i != (-1))
{
i = fileWordMedia.ReadByte();
if (i != (-1))
{
fileHtmlMedia.WriteByte((byte)i);
}
}
fileHtmlMedia.Close();
fileWordMedia.Close();
}
回答by theprisoner6
Okay, thank you to everyone who helped me out on this. My goal was simpler than replacing an image, mainly to pull out all images in a Word document. I found this code did the work for me on that, INCLUDING the needed extension.
好的,感谢所有帮助我解决这个问题的人。我的目标比替换图像更简单,主要是提取 Word 文档中的所有图像。我发现这段代码为我做了这方面的工作,包括所需的扩展。
Feel free to use:
随意使用:
var inlineImages = from paragraph in wordprocessingDocument.MainDocumentPart.Document.Body
from graphic in paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Graphic>()
let graphicData = graphic.Descendants<DocumentFormat.OpenXml.Drawing.GraphicData>().FirstOrDefault()
let pic = graphicData.ElementAt(0).Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault()
let imgPID = pic.GetAttribute("embed", "http://schemas.openxmlformats.org/officeDocument/2006/relationships").Value
select new { Id = imgPID,
Extension = ((ImagePart)wordprocessingDocument.MainDocumentPart.GetPartById(imgPID)).ContentType.Split('/')[1]
};


