SQL 如何编写一个简单的数据库引擎
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1052189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to write a simple database engine
提问by a_m0d
I am interested in learning how a database engine works (i.e. the internals of it). I know most of the basic data structures taught in CS (trees, hash tables, lists, etc.) as well as a pretty good understanding of compiler theory (and have implemented a very simple interpreter) but I don't understand how to go about writing a database engine. I have searched for tutorials on the subject and I couldn't find any, so I am hoping someone else can point me in the right direction. Basically, I would like information on the following:
我有兴趣学习数据库引擎的工作原理(即它的内部结构)。我知道 CS 中教授的大部分基本数据结构(树、哈希表、列表等)以及对编译器理论的很好理解(并且已经实现了一个非常简单的解释器)但我不明白如何去做关于编写数据库引擎。我已经搜索了有关该主题的教程,但找不到任何教程,因此我希望其他人可以为我指明正确的方向。基本上,我想了解以下信息:
- How the data is stored internally (i.e. how tables are represented, etc.)
- How the engine finds data that it needs (e.g. run a SELECT query)
- How data is inserted in a way that is fast and efficient
- 内部如何存储数据(即如何表示表等)
- 引擎如何找到它需要的数据(例如运行 SELECT 查询)
- 如何以快速高效的方式插入数据
And any other topics that may be relevant to this. It doesn't have to be an on-disk database - even an in-memory database is fine (if it is easier) because I just want to learn the principals behind it.
以及可能与此相关的任何其他主题。它不一定是磁盘数据库——即使是内存数据库也可以(如果它更容易的话),因为我只想了解它背后的原理。
Many thanks for your help.
非常感谢您的帮助。
采纳答案by Robert Harvey
If you're good at reading code, studying SQLite will teach you a whole boatload about database design. It's small, so it's easier to wrap your head around. But it's also professionally written.
如果您擅长阅读代码,那么学习 SQLite 将教会您大量有关数据库设计的知识。它很小,所以更容易环绕你的头。但它也写得很专业。
回答by A.Rashad
The answer to this question is a huge one. expect a PHD thesis to have it answered 100% ;) but we can think of the problems one by one:
这个问题的答案是巨大的。期待一篇博士论文能100%回答;) 但我们可以一一思考问题:
How to store the data internally: you should have a data file containing your database objects and a caching mechanism to load the data in focus and some data around it into RAM assume you have a table, with some data, we would create a data format to convert this table into a binary file, by agreeing on the definition of a column delimiter and a row delimiter and make sure such pattern of delimiter is never used in your data itself. i.e. if you have selected <*> for example to separate columns, you should validate the data you are placing in this table not to contain this pattern. you could also use a row header and a column header by specifying size of row and some internal indexing number to speed up your search, and at the start of each column to have the length of this column like "Adam", 1, 11.1, "123 ABC Street POBox 456" you can have it like <&RowHeader, 1><&Col1,CHR, 4>Adam<&Col2, num,1,0>1<&Col3, Num,2,1>111<&Col4, CHR, 24>123 ABC Street POBox 456<&RowTrailer>
How to find items quickly try using hashing and indexing to point at data stored and cached based on different criteria taking same example above, you could sort the value of the first column and store it in a separate object pointing at row id of items sorted alphabetically, and so on
How to speed insert data I know from Oracle is that they insert data in a temporary place both in RAM and on disk and do housekeeping on periodic basis, the database engine is busy all the time optimizing its structure but in the same time we do not want to lose data in case of power failure of something like that. so try to keep data in this temporary place with no sorting, append your original storage, and later on when system is free resort your indexes and clear the temp area when done
如何在内部存储数据:你应该有一个包含你的数据库对象的数据文件和一个缓存机制来加载焦点数据和它周围的一些数据到 RAM 假设你有一个表,有一些数据,我们将创建一个数据格式将此表转换为二进制文件,通过就列分隔符和行分隔符的定义达成一致,并确保此类分隔符模式从未在您的数据本身中使用。例如,如果您选择了 <*> 来分隔列,您应该验证您放置在此表中的数据不包含此模式。您还可以通过指定行的大小和一些内部索引号来使用行标题和列标题来加快搜索速度,并在每列的开头使用该列的长度,如“Adam”、1、11.1、 ”
如何快速查找项目 尝试使用散列和索引指向基于不同标准存储和缓存的数据,采用上述相同的示例,您可以对第一列的值进行排序,并将其存储在一个单独的对象中,指向按字母顺序排序的项目的行 id , 等等
我从 Oracle 了解到的如何加快插入数据的速度是他们将数据插入 RAM 和磁盘中的临时位置并定期进行内务处理,数据库引擎一直忙于优化其结构,但同时我们没有想在断电的情况下丢失数据。所以尝试将数据保存在这个临时位置而不进行排序,附加您的原始存储,然后在系统空闲时使用索引并在完成后清除临时区域
good luck, great project.
祝你好运,伟大的项目。
回答by Juergen
SQLite was mentioned before, but I want to add some thing.
之前提到过 SQLite,但我想添加一些东西。
I personally learned a lot by studying SQlite. The interesting thing is, that I did not go to the source code (though I just had a short look). I learned much by reading the technical material and specially looking at the internal commands it generates. It has an own stack based interpreter inside and you can read the P-Code it generates internally just by using explain. Thus you can see how various constructs are translated to the low-level engine (that is surprisingly simple -- but that is also the secret of its stability and efficiency).
我个人通过学习 SQlite 学到了很多东西。有趣的是,我没有去看源代码(虽然我只是看了一眼)。通过阅读技术资料并特别查看它生成的内部命令,我学到了很多东西。它内部有一个自己的基于堆栈的解释器,您可以通过使用解释来读取它内部生成的 P 代码。因此,您可以看到如何将各种构造转换为低级引擎(这非常简单——但这也是其稳定性和效率的秘诀)。
回答by djna
There are books on the topic a good place to start would be Database Systems: The Complete Bookby Garcia-Molina, Ullman, and Widom
回答by michael aubert
I would suggest focusing on www.sqlite.org
我建议关注 www.sqlite.org
It's recent, small (source code 1MB), open source (so you can figure it out for yourself)...
它是最近的,小(源代码 1MB),开源(所以你可以自己弄清楚)......
Books have been written about how it is implemented:
已经写了关于它是如何实现的书籍:
http://www.sqlite.org/books.html
http://www.sqlite.org/books.html
It runs on a variety of operating systems for both desktop computers and mobile phones so experimenting is easy and learning about it will be useful right now and in the future.
它在台式电脑和手机的各种操作系统上运行,因此实验很容易,学习它现在和将来都会很有用。
It even has a decent community here: https://stackoverflow.com/questions/tagged/sqlite
它甚至在这里有一个不错的社区:https: //stackoverflow.com/questions/tagged/sqlite
回答by a_m0d
Okay, I have found a site which has some information on SQL and implementation - it is a bit hard to link to the page which lists all the tutorials, so I will link them one by one:
好的,我找到了一个网站,里面有一些关于 SQL 和实现的信息——链接到列出所有教程的页面有点困难,所以我将一一链接它们:
回答by nightingale2k1
回答by nightingale2k1
If MySQL interests you, I would also suggest this wiki page, which has got some information about how MySQL works. Also, you might want to take a look at Understanding MySQL Internals.
如果您对 MySQL 感兴趣,我还建议您访问这个wiki 页面,其中提供了一些有关 MySQL 工作原理的信息。此外,您可能想看看了解 MySQL 内部。
You might also consider looking at a non-SQL interface for your Database engine. Please take a look at Apache CouchDB. Its what you would call, a document oriented database system.
您还可以考虑为您的数据库引擎查看非 SQL 接口。请看一下Apache CouchDB。它就是您所说的,一个面向文档的数据库系统。
Good Luck!
祝你好运!
回答by sud03r
I am not sure whether it would fit to your requirements but I had implemented a simple file oriented database with support for simple (SELECT, INSERT , UPDATE
) using perl.
What I did was I stored each table as a file on disk and entries with a well defined pattern and manipulated the data using in built linux tools like awk and sed. for improving efficiency, frequently accessed data were cached.
我不确定它是否符合您的要求,但我已经SELECT, INSERT , UPDATE
使用 perl实现了一个简单的面向文件的数据库,并支持 simple ( )。
我所做的是将每个表作为文件存储在磁盘上,并使用定义明确的模式将条目存储起来,并使用内置的 linux 工具(如 awk 和 sed)操作数据。为了提高效率,经常访问的数据被缓存。