C# 在 xml 文件中搜索数据的最佳方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/563998/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to search data in xml files?
提问by gk.
In our new project we have to provide a search functionality to retrieve data from hundreds of xml files. I have a brief of our current plan below, I would like to know your suggestions/improvements on this.
在我们的新项目中,我们必须提供一个搜索功能来从数百个 xml 文件中检索数据。我在下面简要介绍了我们目前的计划,我想知道您对此的建议/改进。
These xml files contain personal information, and the search is based on 10 elements in it for example last name, first name, email etc. Our current plan is to create an master XmlDocument with all the searchable data and a key to the actual file. So that when the user searches the data we first look at master file and get the the results. We will also cache the actual xml files from the recent searches so simillar searches later can be handled quickly.
这些 xml 文件包含个人信息,搜索基于其中的 10 个元素,例如姓氏、名字、电子邮件等。我们目前的计划是创建一个主 XmlDocument,其中包含所有可搜索的数据和实际文件的键。这样当用户搜索数据时,我们首先查看主文件并获得结果。我们还将缓存来自最近搜索的实际 xml 文件,以便以后可以快速处理类似的搜索。
Our application is a .net 2.0 web application.
我们的应用程序是一个 .net 2.0 Web 应用程序。
采纳答案by Marc Gravell
First: how big are the xml files? XmlDocument
doesn't scale to "huge"... but can handle "large" OK.
第一:xml文件有多大?XmlDocument
不会扩展到“巨大”……但可以处理“大”。
Second: can you perhaps put the data into a regular database structure (perhaps SQL Server Express Edition), index it, and access via regular TSQL? That will usually out-perform an xpath search. Equally, if it is structured, SQL Server 2005 and above supports the xml
data-type, which shredsdata - this allows you to index and query xml data in the database without having the entire DOM in memory (it translates xpath into relational queries).
第二:您能否将数据放入常规数据库结构(可能是 SQL Server Express Edition)、索引并通过常规 TSQL 访问?这通常会胜过 xpath 搜索。同样,如果它是结构化的,SQL Server 2005 及更高版本支持xml
数据类型,它将数据切碎- 这允许您在数据库中索引和查询 xml 数据,而无需在内存中存储整个 DOM(它将 xpath 转换为关系查询)。
回答by Dave Barker
If you can store then data in a SQL Server database then you could make use of SQL Servers in built XPath query functionality.
如果您可以将数据存储在 SQL Server 数据库中,那么您可以在内置的 XPath 查询功能中使用 SQL Server。
回答by MrTelly
Hmm, sounds like your building a database over the top of Xml, for performance I'd be reading those files into the DB of your choice, and let it handle indexing and searching for you. If that's not an option get really with XPath, or roll your own exhaustive search using XmlReader.
嗯,听起来像是您在 Xml 之上构建了一个数据库,为了提高性能,我会将这些文件读入您选择的数据库中,并让它处理索引和搜索。如果这不是一个选项,请真正使用 XPath,或者使用 XmlReader 进行您自己的详尽搜索。
Xml is not the answer to every problem, however clean it appears to be, performance will suck.
Xml 不是所有问题的答案,无论它看起来多么干净,性能都会很差。
回答by Nahom Tijnam
Why dont you store the searchable data in a database table with key to the actual file? So your search would be on database table rather than xml file. I suppose this would be faster because you may index the table for faster searching.
为什么不将可搜索数据存储在带有实际文件键的数据库表中?所以你的搜索将在数据库表而不是 xml 文件上。我想这会更快,因为您可以索引表以进行更快的搜索。
回答by Gautam
Index your XML files. Look into http://incubator.apache.org/lucene.net/
索引您的 XML 文件。查看http://incubator.apache.org/lucene.net/
I recently used it at my previous job to cache our SQL database for fast searching and very little overhead.
我最近在我之前的工作中使用它来缓存我们的 SQL 数据库,以便快速搜索和很少的开销。
It provides fast searching of content inside xml files (all depending on how you organize your cache).
它提供了对 xml 文件中内容的快速搜索(这一切都取决于您组织缓存的方式)。
Very easy and straight forward to use.
使用起来非常简单直接。
Much easier than trying to loop through a bunch of files.
比尝试遍历一堆文件要容易得多。