database 如何在数据存储而不是数据库中思考?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/103727/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 06:57:02  来源:igfitidea点击:

How to think in data stores instead of databases?

databasegoogle-app-enginegoogle-cloud-platformgoogle-cloud-datastore

提问by Jim

As an example, Google App Engine uses Google Datastore, not a standard database, to store data. Does anybody have any tips for using Google Datastore instead of databases? It seems I've trained my mind to think 100% in object relationships that map directly to table structures, and now it's hard to see anything differently. I can understand some of the benefits of Google Datastore (e.g. performance and the ability to distribute data), but some good database functionality is sacrificed (e.g. joins).

例如,Google App Engine 使用 Google Datastore 而不是标准数据库来存储数据。有人对使用 Google Datastore 而不是数据库有什么建议吗?似乎我已经训练我的思想在直接映射到表结构的对象关系中进行 100% 思考,现在很难看到任何不同的东西。我可以理解 Google Datastore 的一些好处(例如性能和分发数据的能力),但是牺牲了一些好的数据库功能(例如连接)。

Does anybody who has worked with Google Datastore or BigTable have any good advice to working with them?

有没有使用过 Google Datastore 或 BigTable 的人对与他们合作有什么好的建议?

采纳答案by Nick Johnson

There's two main things to get used to about the App Engine datastore when compared to 'traditional' relational databases:

与“传统”关系数据库相比,App Engine 数据存储区主要有两点需要适应:

  • The datastore makes no distinction between inserts and updates. When you call put() on an entity, that entity gets stored to the datastore with its unique key, and anything that has that key gets overwritten. Basically, each entity kind in the datastore acts like an enormous map or sorted list.
  • Querying, as you alluded to, is much more limited. No joins, for a start.
  • 数据存储区不区分插入和更新。当您对实体调用 put() 时,该实体将使用其唯一键存储到数据存储中,并且任何具有该键的内容都会被覆盖。基本上,数据存储中的每个实体类型都像一个巨大的地图或排序列表。
  • 正如您所提到的,查询受到的限制要大得多。没有加入,首先。

The key thing to realise - and the reason behind both these differences - is that Bigtable basically acts like an enormous ordered dictionary. Thus, a put operation just sets the value for a given key - regardless of any previous value for that key, and fetch operations are limited to fetching single keys or contiguous ranges of keys. More sophisticated queries are made possible with indexes, which are basically just tables of their own, allowing you to implement more complex queries as scans on contiguous ranges.

要意识到的关键——以及这些差异背后的原因——是 Bigtable 基本上就像一个巨大的有序字典。因此,放置操作只是设置给定键的值——不管该键的任何先前值如何,并且获取操作仅限于获取单个键或连续范围的键。索引可以实现更复杂的查询,索引基本上只是它们自己的表,允许您实现更复杂的查询作为对连续范围的扫描。

Once you've absorbed that, you have the basic knowledge needed to understand the capabilities and limitations of the datastore. Restrictions that may have seemed arbitrary probably make more sense.

一旦你吸收了它,你就有了理解数据存储的功能和限制所需的基本知识。看似随意的限制可能更有意义。

The key thing here is that although these are restrictions over what you can do in a relational database, these same restrictions are what make it practical to scale up to the sort of magnitude that Bigtable is designed to handle. You simply can't execute the sort of query that looks good on paper but is atrociously slow in an SQL database.

这里的关键是,虽然这些限制了您在关系数据库中可以做什么,但这些相同的限制使扩展到 Bigtable 旨在处理的那种规模变得切实可行。您根本无法执行在纸面上看起来不错但在 SQL 数据库中速度极慢的那种查询。

In terms of how to change how you represent data, the most important thing is precalculation. Instead of doing joins at query time, precalculate data and store it in the datastore wherever possible. If you want to pick a random record, generate a random number and store it with each record. There's a whole cookbook of these sort of tips and tricks hereEdit: The cookbook is no longer in existence.

就如何改变表示数据的方式而言,最重要的是预先计算。不要在查询时进行连接,而是尽可能预先计算数据并将其存储在数据存储中。如果要选择随机记录,请生成一个随机数并将其与每条记录一起存储。有这些种类的技巧和窍门的整个食谱这里编辑:菜谱是不再存在。

回答by user19087

The way I have been going about the mind switch is to forget about the database altogether.

我一直在进行思维转换的方式是完全忘记数据库。

In the relational db world you always have to worry about data normalization and your table structure. Ditch it all. Just layout your web page. Lay them all out. Now look at them. You're already 2/3 there.

在关系数据库世界中,您总是需要担心数据规范化和表结构。全部扔掉。只需布局您的网页。把它们都摆出来。现在看看他们。你已经有 2/3 了。

If you forget the notion that database size matters and data shouldn't be duplicated then you're 3/4 there and you didn't even have to write any code! Let your views dictate your Models. You don't have to take your objects and make them 2 dimensional anymore as in the relational world. You can store objects with shape now.

如果您忘记了数据库大小很重要并且数据不应该被复制的概念,那么您就只有 3/4 了,您甚至不必编写任何代码!让您的观点决定您的模型。您不必再像在关系世界中那样将对象变成二维的。您现在可以存储具有形状的对象。

Yes, this is a simplified explanation of the ordeal, but it helped me forget about databases and just make an application. I have made 4 App Engine apps so far using this philosophy and there are more to come.

是的,这是对磨难的简化解释,但它帮助我忘记了数据库,而只是创建了一个应用程序。到目前为止,我已经使用这种理念制作了 4 个 App Engine 应用程序,而且还有更多。

回答by user19087

I always chuckle when people come out with - it's not relational. I've written cellectr in django and here's a snippet of my model below. As you'll see, I have leagues that are managed or coached by users. I can from a league get all the managers, or from a given user I can return the league she coaches or managers.

当人们出来时我总是轻笑 - 这不是关系。我已经用 django 编写了 cellectr,下面是我的模型的一个片段。正如您将看到的,我拥有由用户管理或指导的联赛。我可以从一个联盟获得所有的经理,或者从一个给定的用户我可以返回她执教或经理的联盟。

Just because there's no specific foreign key support doesn't mean you can't have a database model with relationships.

仅仅因为没有特定的外键支持并不意味着您不能拥有具有关系的数据库模型。

My two pence.

我的两便士。



class League(BaseModel):
    name = db.StringProperty()    
    managers = db.ListProperty(db.Key) #all the users who can view/edit this league
    coaches = db.ListProperty(db.Key) #all the users who are able to view this league

    def get_managers(self):
        # This returns the models themselves, not just the keys that are stored in teams
        return UserPrefs.get(self.managers)

    def get_coaches(self):
        # This returns the models themselves, not just the keys that are stored in teams
        return UserPrefs.get(self.coaches)      

    def __str__(self):
        return self.name

    # Need to delete all the associated games, teams and players
    def delete(self):
        for player in self.leagues_players:
            player.delete()
        for game in self.leagues_games:
            game.delete()
        for team in self.leagues_teams:
            team.delete()            
        super(League, self).delete()

class UserPrefs(db.Model):
    user = db.UserProperty()
    league_ref = db.ReferenceProperty(reference_class=League,
                            collection_name='users') #league the users are managing

    def __str__(self):
        return self.user.nickname

    # many-to-many relationship, a user can coach many leagues, a league can be
    # coached by many users
    @property
    def managing(self):
        return League.gql('WHERE managers = :1', self.key())

    @property
    def coaching(self):
        return League.gql('WHERE coaches = :1', self.key())

    # remove all references to me when I'm deleted
    def delete(self):
        for manager in self.managing:
            manager.managers.remove(self.key())
            manager.put()
        for coach in self.managing:
            coach.coaches.remove(self.key())
            coaches.put()            
        super(UserPrefs, self).delete()    

回答by sanjay kushwah

I came from Relational Database world then i found this Datastore thing. it took several days to get hang of it. well there are some of my findings.

我来自关系数据库世界,然后我发现了这个数据存储区。花了几天时间才搞定。好吧,有一些我的发现。

You must have already know that Datastore is build to scale and that is the thing that separates it from RDMBS. to scale better with large dataset, App Engine has done some changes(some means lot of changes).

您一定已经知道 Datastore 是按比例构建的,这也是它与 RDMBS 的区别所在。为了更好地扩展大型数据集,App Engine 进行了一些更改(有些意味着大量更改)。

RDBMS VS DataStore
Structure
In database, we usually structure our data in Tables, Rows which is in Datastore it becomes Kinds and Entities.

RDBMS VS DataStore
结构
在数据库中,我们通常将数据结构化为 Tables,Rows,Datastore 中的数据变成Kinds 和 Entities

Relations
In RDBMS, Most of the people folllows the One-to-One, Many-to-One, Many-to-Many relationship, In Datastore, As it has "No Joins" thing but still we can achieve our normalization using "ReferenceProperty" e.g. One-to-One Relationship Example.

关系
在 RDBMS 中,大多数人遵循一对一、多对一、多对多的关系,在 Datastore 中,由于它具有“无连接”的东西,但我们仍然可以使用“ ReferenceProperty”来实现我们的规范化”例如一对一关系示例

Indexes
Usually in RDMBS we make indexes like Primary Key, Foreign Key, Unique Key and Index key to speed up the search and boost our database performance. In datastore, you have to make atleast one index per kind(it will automatically generatewhether you like it or not) because datastore search your entity on the basis of these indexes and believe me that is the best part, In RDBMS you can search using non-index field though it will take some time but it will. In Datastore you can not search using non-index property.

索引
通常在 RDMBS 中,我们制作诸如主键、外键、唯一键和索引键之类的索引,以加快搜索速度并提高我们的数据库性能。在数据存储中,您必须至少为每种类型创建一个索引(无论您喜欢与否,它都会自动生成)因为数据存储根据这些索引搜索您的实体,相信我这是最好的部分,在 RDBMS 中,您可以使用非索引字段虽然需要一些时间,但它会。在 Datastore 中,您不能使用非索引属性进行搜索。

Count
In RDMBS, it is much easier to count(*) but in datastore, Please dont even think it in normal way(Yeah there is a count function) as it has 1000 Limitand it will cost as much small opertionas the entity which is not good but we always have good choices, we can use Shard Counters.

Count
在 RDMBS 中,count(*) 更容易,但在数据存储中,请不要以正常方式思考(是的,有一个计数功能),因为它有1000 个限制,并且它会花费与实体一样小的操作不好,但我们总是有很好的选择,我们可以使用Shard Counters

Unique Constraints
In RDMBS, We love this feature right? but Datastore has its own way. you cannot define a property as unique :(.


RDMBS 中的独特约束,我们喜欢这个功能吧?但是 Datastore 有它自己的方式。您不能将属性定义为唯一的 :(。

Query
GAE Datatore provides a better feature much LIKE(Oh no! datastore does not have LIKE Keyword) SQL which is GQL.

查询
GAE Datatore提供了更好的功能太多LIKE(哦,不!数据存储没有LIKE关键字),SQL是GQL

Data Insert/Update/Delete/Select
This where we all are interested in, as in RDMBS we require one query for Insert, Update, Delete and Select just like RDBMS, Datastore has put, delete, get(dont get too excited) because Datastore put or get in terms of Write, Read, Small Operations(Read Costs for Datastore Calls) and thats where Data Modeling comes into action. you have to minimize these operations and keep your app running. For Reducing Read operationyou can use Memcache.

数据插入/更新/删除/选择
这是我们都感兴趣的地方,因为在 RDMBS 中,我们需要一个对插入、更新、删除和选择的查询,就像 RDBMS 一样,Datastore 有 put、delete、get(不要太兴奋)因为 Datastore根据写入、读取、小操作数据存储调用的读取成本)来放置或获取,这就是数据建模发挥作用的地方。您必须尽量减少这些操作并保持您的应用程序运行。对于减少读取操作,您可以使用Memcache

回答by Jon Stevens

Take a look at the Objectify documentation. The first comment at the bottom of the page says:

查看 Objectify 文档。页面底部的第一条评论说:

"Nice, although you wrote this to describe Objectify, it is also one of the most concise explanation of appengine datastore itself I've ever read. Thank you."

“很好,虽然你写这篇文章是为了描述 Objectify,但它也是我读过的对 appengine 数据存储本身的最简洁的解释之一。谢谢。”

https://github.com/objectify/objectify/wiki/Concepts

https://github.com/objectify/objectify/wiki/Concepts

回答by Mark Cidade

If you're used to thinking about ORM-mapped entities then that's basically how an entity-based datastore like Google's App Engine works. For something like joins, you can look at reference properties. You don't really need to be concerned about whether it uses BigTable for the backend or something else since the backend is abstracted by the GQL and Datastore API interfaces.

如果您习惯于考虑 ORM 映射实体,那么这基本上就是像 Google 的 App Engine 这样的基于实体的数据存储的工作方式。对于连接之类的东西,您可以查看参考属性。由于后端是由 GQL 和 Datastore API 接口抽象出来的,因此您真的不需要担心它是否使用 BigTable 作为后端或其他东西。

回答by ringadingding

The way I look at datastore is, kind identifies table, per se, and entity is individual row within table. If google were to take out kind than its just one big table with no structure and you can dump whatever you want in an entity. In other words if entities are not tied to a kind you pretty much can have any structure to an entity and store in one location (kind of a big file with no structure to it, each line has structure of its own).

我看待数据存储的方式是,种类标识表本身,实体是表中的单个行。如果谷歌要拿出一张没有结构的大表,你可以在实体中倾倒任何你想要的东西。换句话说,如果实体不绑定到一种类型,您几乎可以将任何结构连接到一个实体并存储在一个位置(一种没有结构的大文件,每一行都有自己的结构)。

Now back to original comment, google datastore and bigtable are two different things so do not confuse google datastore to datastore data storage sense. Bigtable is more expensive than bigquery (Primary reason we didn't go with it). Bigquery does have proper joins and RDBMS like sql language and its cheaper, why not use bigquery. That being said, bigquery does have some limitations, depending on size of your data you might or might not encounter them.

现在回到最初的评论,谷歌数据存储和大表是两个不同的东西,所以不要混淆谷歌数据存储和数据存储数据存储的意义。Bigtable 比 bigquery 更昂贵(我们没有使用它的主要原因)。Bigquery 确实有适当的连接和 RDBMS,比如 sql 语言,而且它更便宜,为什么不使用 bigquery。话虽如此,bigquery 确实有一些限制,具体取决于您可能会或可能不会遇到的数据大小。

Also, in terms of thinking in terms of datastore, i think proper statement would have been "thinking in terms of NoSQL databases". There are too many of them available out there these days but when it comes to google products except google cloud SQL (which is mySQL) everything else is NoSQL.

此外,就数据存储方面的思考而言,我认为正确的说法应该是“根据 NoSQL 数据库进行思考”。这些天有太多可用的,但是当涉及到除谷歌云 SQL(即 mySQL)之外的谷歌产品时,其他一切都是 NoSQL。

回答by devinmoore

Being rooted in the database world, a data store to me would be a giant table (hence the name "bigtable"). BigTable is a bad example though because it does a lot of other things that a typical database might not do, and yet it is still a database. Chances are unless you know you need to build something like Google's "bigtable", you will probably be fine with a standard database. They need that because they are handling insane amounts of data and systems together, and no commercially available system can really do the job the exact way they can demonstrate that they need the job to be done.

扎根于数据库世界,数据存储对我来说将是一个巨大的表(因此得名“bigtable”)。BigTable 是一个糟糕的例子,因为它做了很多其他典型数据库可能不会做的事情,但它仍然是一个数据库。很有可能,除非您知道您需要构建类似 Google 的“bigtable”之类的东西,否则您可能会使用标准数据库。他们需要它,因为他们一起处理大量的数据和系统,并且没有任何商用系统可以真正以他们可以证明他们需要完成工作的方式来完成工作。

(bigtable reference: http://en.wikipedia.org/wiki/BigTable)

(大表参考:http: //en.wikipedia.org/wiki/BigTable