SQL 用于组织历史股票数据的数据库模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1523576/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 03:54:18  来源:igfitidea点击:

Database schema for organizing historical stock data

sqlsqliteschemastocks

提问by nall

I'm creating a database schema for storing historical stock data. I currently have a schema as show below.

我正在创建一个用于存储历史股票数据的数据库模式。我目前有一个架构,如下所示。

My requirements are to store "bar data" (date, open, high, low, close volume) for multiple stock symbols. Each symbol might also have multiple timeframes (e.g. Google Weekly bars and Google Daily bars).

我的要求是为多个股票代码存储“条形数据”(日期、开盘价、最高价、最低价、收盘量)。每个交易品种也可能有多个时间范围(例如 Google 每周条形图和 Google 每日条形图)。

My current schema puts the bulk of the data is in the OHLCV table. I'm far from a database expert and am curious if this is too naive. Constructive input is very welcome.

我当前的模式将大部分数据放在 OHLCV 表中。我远非数据库专家,我很好奇这是否太天真。建设性的意见是非常受欢迎的。

CREATE TABLE Exchange (exchange TEXT UNIQUE NOT NULL);

CREATE TABLE Symbol (symbol TEXT UNIQUE NOT NULL, exchangeID INTEGER NOT NULL);

CREATE TABLE Timeframe (timeframe TEXT NOT NULL, symbolID INTEGER NOT NULL);

CREATE TABLE OHLCV (date TEXT NOT NULL CHECK (date LIKE '____-__-__ __:__:__'),
    open REAL NOT NULL,
    high REAL NOT NULL,
    low REAL NOT NULL,
    close REAL NOT NULL,
    volume INTEGER NOT NULL,
    timeframeID INTEGER NOT NULL);

This means my queries currently go something like: Find the timeframeID for a given symbol/timeframe, then do a select on the OHLCV table where the timeframeID matches.

这意味着我的查询目前类似于:查找给定交易品种/时间帧的时间帧 ID,然后在时间帧 ID 匹配的 OHLCV 表上进行选择。

采纳答案by Michiel Buddingh

Well, on the positive side, you have the good sense to ask for input first. That puts you ahead of 90% of people unfamiliar with database design.

嗯,从积极的方面来说,你有很好的意识首先征求意见。这使您领先于 90% 不熟悉数据库设计的人。

  • There are no clear foreign key relationships. I take it timeframeIDrelates to symbolID?
  • It's unclear how you'd be able to find anything this way. Reading up on abovementioned foreign keys should improve your understanding tremendously with little effort.
  • You're storing timeframe data as TEXT. From a performance as well as a usability perspective, that's a no-no.
  • Your current scheme can't accommodate stock splits, which will happen eventually. It's better to add one further layer of indirection between the price data table and the Symbol
  • open, high, low, closeprices are better stored as decimal or currency types, or, preferably, as an INTEGERfield with a separate INTEGERfield storing the divisor, as the smallest price fraction (cents, eights of a dollar, etc.) allowed varies per exchange.
  • Since you support multiple exchanges, you should support multiple currencies.
  • 没有明确的外键关系。我认为它timeframeIDsymbolID?
  • 目前尚不清楚您如何以这种方式找到任何东西。阅读上述外键应该可以毫不费力地极大地提高您的理解。
  • 您将时间范围数据存储为TEXT. 从性能和可用性的角度来看,这是一个禁忌。
  • 您当前的计划无法适应股票拆分,这最终会发生。最好在价格数据表和 Symbol 之间再添加一层间接
  • open, high, low,close价格最好存储为十进制或货币类型,或者最好存储为INTEGER带有单独INTEGER字段的字段存储除数,因为允许的最小价格分数(美分、八分之一美元等)因每次交易而异。
  • 既然你支持多种交易所,你应该支持多种货币。

I apologise if all of this doesn't seem too 'constructive', especially since I'm too sleepy right now to suggest a more usable alternative. I hope the above is enough to set you on your way.

如果所有这些看起来不太“有建设性”,我深表歉意,尤其是因为我现在太困了,无法提出更有用的替代方案。我希望以上内容足以让您上路。

回答by boe100

We tried to find a proper database structure for storing large amount of data for a long time. The solution below is the result of more than 6 years of experience. It is now working flawlessly for our quantitative analysis.

我们试图找到一个合适的数据库结构来长期存储大量数据。下面的解决方案是超过 6 年经验的结果。它现在可以完美地用于我们的定量分析。

We have been able to store hundreds of gigabytes of intraday and daily data using this scheme in SQL Server:

我们已经能够在 SQL Server 中使用此方案存储数百 GB 的日内和每日数据:

 Symbol -  char 6
 Date -  date
 Time -  time
 Open -  decimal 18, 4
 High -  decimal 18, 4
 Low -  decimal 18, 4
 Close -  decimal 18, 4
 Volume -  int

All trading instruments are stored in a single table. We also have a clustered index on symbol, date and time columns.

所有交易工具都存储在一个表中。我们还有一个关于符号、日期和时间列的聚集索引。

For daily data, we have a separate table and do not use the Time column. Volume datatype is also bigint instead of int.

对于每日数据,我们有一个单独的表并且不使用时间列。卷数据类型也是 bigint 而不是 int。

The performance? We can get data out of the server in a matter of milliseconds. Remember, the database size is almost 1 terabyte.

表现?我们可以在几毫秒内从服务器获取数据。请记住,数据库大小几乎是 1 TB。

We purchased all of our historical market data from the Kibot web site: http://www.kibot.com/

我们从 Kibot 网站购买了我们所有的历史市场数据:http://www.kibot.com/

回答by Mike Woodhouse

I'm not sure what value is added by Timeframe- it seems like an unnecessary complication, but that could be something I'm failing to understand ;-) Can a Timeframe have more than one OHLCV? If not, then I'd suggest they be merged.

我不确定增加了什么价值Timeframe- 这似乎是一种不必要的复杂化,但这可能是我无法理解的事情 ;-) 一个时间框架可以有多个 OHLCV 吗?如果没有,那么我建议将它们合并。

I would note also that stock tickers change from time to time for any number of reasons. It's not a frequent event, but it happens. If you're thinking about working with your data as time series, you should be aware of the issue so that you can handle it when it comes, if not before. If you're not tracking stocks (you may be working on a futures app, say) then this advice may be taken with the appropriate amount of salt.

我还要指出,由于各种原因,股票代码会不时发生变化。这不是经常发生的事件,但它发生了。如果您正在考虑将数据作为时间序列处理,您应该意识到这个问题,以便您可以在它出现时(如果不是之前)处理它。如果您不跟踪股票(例如,您可能正在开发一个期货应用程序),那么可以使用适量的盐来采纳此建议。

Again mostly relevant to stocks, splits have been mentioned elsewhere and you may want to consider dividends - a stock's price will typically fall by the dividend amount (or more accurately the present value thereof) on the ex-dividend date, which may be misinterpreted if you don't know a confirmed future cash flow was the reason. Rights issues can be fun, too.

再次与股票最相关,在其他地方已经提到了拆分,您可能需要考虑股息——股票的价格通常会在除息日下降股息金额(或更准确地说是其现值),如果出现以下情况,可能会被误解你不知道确认的未来现金流是原因。权利问题也很有趣。

If you're planning on looking at series of data for a particular symbol, I'd suggest looking into what sort of performance you're going to get. At the very least, make sure you have an appropriate index in place.

如果您打算查看特定品种的一系列数据,我建议您查看您将获得什么样的性能。至少,请确保您有适当的索引。