Python 中的小表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1471924/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Small Tables in Python?
提问by akoumjian
Let's say I don't have more than one or two dozen objects with different properties, such as the following:
假设我没有超过一两打具有不同属性的对象,例如:
UID, Name, Value, Color, Type, Location
UID、名称、值、颜色、类型、位置
I want to be able to call up all objects with Location = "Boston", or Type = "Primary". Classic database query type stuff.
我希望能够调用 Location = "Boston" 或 Type = "Primary" 的所有对象。经典的数据库查询类型的东西。
Most table solutions (pytables, *sql) are really overkill for such a small set of data. Should I simply iterate over all the objects and create a separate dictionary for each data column (adding values to dictionaries as I add new objects)?
大多数表解决方案(pytables、*sql)对于这么小的数据集来说真的是矫枉过正。我是否应该简单地遍历所有对象并为每个数据列创建一个单独的字典(在添加新对象时向字典添加值)?
This would create dicts like this:
这将创建这样的字典:
{'Boston' : [234, 654, 234], 'Chicago' : [324, 765, 342] } - where those 3 digit entries represent things like UID's.
{'Boston' : [234, 654, 234], 'Chicago' : [324, 765, 342] } - 这些 3 位数字条目代表 UID 之类的东西。
As you can see, querying this would be a bit of a pain.
如您所见,查询这会有点痛苦。
Any thoughts of an alternative?
有什么替代方案吗?
回答by Steven Kryskalla
For small relational problems I love using Python's builtin sets.
对于小的关系问题,我喜欢使用 Python 的内置集。
For the example of location = 'Boston' OR type = 'Primary', if you had this data:
对于 location = 'Boston' OR type = 'Primary' 的示例,如果您有以下数据:
users = {
1: dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
2: dict(Name="Mr. Bar", Location="New York", Type="Primary"),
3: dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
#...
}
You can do the WHERE ... OR ...
query like this:
您可以WHERE ... OR ...
像这样进行查询:
set1 = set(u for u in users if users[u]['Location'] == 'Boston')
set2 = set(u for u in users if users[u]['Type'] == 'Primary')
result = set1.union(set2)
Or with just one expression:
或者只用一种表达:
result = set(u for u in users if users[u]['Location'] == 'Boston'
or users[u]['Type'] == 'Primary')
You can also use the functions in itertoolsto create fairly efficient queries of the data. For example if you want to do something similar to a GROUP BY city
:
您还可以使用itertools 中的函数来创建相当有效的数据查询。例如,如果您想执行类似于 a 的操作GROUP BY city
:
cities = ('Boston', 'New York', 'Chicago')
cities_users = dict(map(lambda city: (city, ifilter(lambda u: users[u]['Location'] == city, users)), cities))
You could also build indexes manually (build a dict
mapping Location to User ID) to speed things up. If this becomes too slow or unwieldy then I would probably switch to sqlite, which is now included in the Python (2.5) standard library.
您还可以手动构建索引(构建dict
Location 到 User ID的映射)以加快速度。如果这变得太慢或笨拙,那么我可能会切换到sqlite,它现在包含在 Python (2.5) 标准库中。
回答by Alex Martelli
I do not think sqlite would be "overkill" -- it comes with standard Python since 2.5, so no need to install stuff, and it can make and handle databases in either memory or local disk files. Really, how could it be simpler...? If you want everything in-memory including the initial values, and want to use dicts to express those initial values, for example...:
我不认为 sqlite 会“矫枉过正”——它从 2.5 开始就带有标准 Python,所以不需要安装东西,它可以在内存或本地磁盘文件中创建和处理数据库。真的,怎么可能更简单……?如果您想要内存中的所有内容,包括初始值,并且想要使用 dicts 来表达这些初始值,例如...:
import sqlite3
db = sqlite3.connect(':memory:')
db.execute('Create table Users (Name, Location, Type)')
db.executemany('Insert into Users values(:Name, :Location, :Type)', [
dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
dict(Name="Mr. Bar", Location="New York", Type="Primary"),
dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
])
db.commit()
db.row_factory = sqlite3.Row
and now your in-memory tiny "db" is ready to go. It's no harder to make a DB in a disk file and/or read the initial values from a text file, a CSV, and so forth, of course.
现在你的内存中的小“db”已经准备好了。当然,在磁盘文件中创建数据库和/或从文本文件、CSV 等读取初始值并不难。
Querying is especially flexible, easy and sweet, e.g., you can mix string insertion and parameter substitution at will...:
查询特别灵活、简单和甜蜜,例如,您可以随意混合字符串插入和参数替换...:
def where(w, *a):
c = db.cursor()
c.execute('Select * From Users where %s' % w, *a)
return c.fetchall()
print [r["Name"] for r in where('Type="Secondary"')]
emits [u'Mr. Foo', u'Mr. Quux']
, just like the more elegant but equivalent
发射[u'Mr. Foo', u'Mr. Quux']
,就像更优雅但等效的
print [r["Name"] for r in where('Type=?', ["Secondary"])]
and your desired query's just:
而您想要的查询只是:
print [r["Name"] for r in where('Location="Boston" or Type="Primary"')]
etc. Seriously -- what's not to like?
等等。说真的——有什么不喜欢的?
回答by Triptych
If it's really a small amount of data, I'd not bother with an index and probably just write a helper function:
如果它真的是少量数据,我不会打扰索引,可能只写一个辅助函数:
users = [
dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
dict(Name="Mr. Bar", Location="New York", Type="Primary"),
dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
]
def search(dictlist, **kwargs):
def match(d):
for k,v in kwargs.iteritems():
try:
if d[k] != v:
return False
except KeyError:
return False
return True
return [d for d in dictlist if match(d)]
Which will allow nice looking queries like this:
这将允许像这样的漂亮查询:
result = search(users, Type="Secondary")