Python将numpy数组插入到sqlite3数据库中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18621513/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:15:05  来源:igfitidea点击:

Python insert numpy array into sqlite3 database

pythonnumpysqlite

提问by Joe Flip

I'm trying to store a numpy array of about 1000 floats in a sqlite3 database but I keep getting the error "InterfaceError: Error binding parameter 1 - probably unsupported type".

我正在尝试在 sqlite3 数据库中存储大约 1000 个浮点数的 numpy 数组,但我不断收到错误“InterfaceError:错误绑定参数 1 - 可能不受支持的类型”。

I was under the impression a BLOB data type could be anything but it definitely doesn't work with a numpy array. Here's what I tried:

我的印象是 BLOB 数据类型可以是任何东西,但它绝对不适用于 numpy 数组。这是我尝试过的:

import sqlite3 as sql
import numpy as np
con = sql.connect('test.bd',isolation_level=None)
cur = con.cursor()
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None,np.arange(0,500,0.5)))
con.commit()

Is there another module I can use to get the numpy array into the table? Or can I convert the numpy array into another form in Python (like a list or string I can split) that sqlite will accept? Performance isn't a priority. I just want it to work!

是否有另一个模块可以用来将 numpy 数组放入表中?或者我可以在 Python 中将 numpy 数组转换为另一种形式(例如我可以拆分的列表或字符串),sqlite 可以接受吗?性能不是优先事项。我只是想让它工作!

Thanks!

谢谢!

采纳答案by unutbu

You could register a new arraydata type with sqlite3:

您可以使用以下命令注册新的array数据类型sqlite3

import sqlite3
import numpy as np
import io

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read())

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    return np.load(out)


# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)

# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)

x = np.arange(12).reshape(2,6)

con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (arr array)")

With this setup, you can simply insert the NumPy array with no change in syntax:

使用此设置,您可以简单地插入 NumPy 数组,而无需更改语法:

cur.execute("insert into test (arr) values (?)", (x, ))

And retrieve the array directly from sqlite as a NumPy array:

并直接从 sqlite 检索数组作为 NumPy 数组:

cur.execute("select arr from test")
data = cur.fetchone()[0]

print(data)
# [[ 0  1  2  3  4  5]
#  [ 6  7  8  9 10 11]]
print(type(data))
# <type 'numpy.ndarray'>

回答by reptilicus

This works for me:

这对我有用:

import sqlite3 as sql
import numpy as np
import json
con = sql.connect('test.db',isolation_level=None)
cur = con.cursor()
cur.execute("DROP TABLE FOOBAR")
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None, json.dumps(np.arange(0,500,0.5).tolist())))
con.commit()
cur.execute("SELECT * FROM FOOBAR")
data = cur.fetchall()
print data
data = cur.fetchall()
my_list = json.loads(data[0][1])

回答by SoulNibbler

Happy Leap Second has it close but I kept getting an automatic casting to string. Also if you check out this other post: a fun debate on using buffer or Binary to push non text data into sqliteyou see that the documented approach is to avoid the buffer all together and use this chunk of code.

Happy Leap Second 已经接近尾声了,但我一直在自动转换成字符串。此外,如果您查看另一篇文章:关于使用缓冲区或二进制将非文本数据推送到 sqlite 的有趣辩论,您会看到记录的方法是一起避免缓冲区并使用这段代码。

def adapt_array(arr):
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read())

I haven't heavily tested this in python 3, but it seems to work in python 2.7

我没有在 python 3 中对此进行大量测试,但它似乎在 python 2.7 中工作

回答by asterio gonzalez

I think that matlabformat is a really convenient way to store and retrieve numpy arrays. Is really fastand the disk and memory footprintis quite the same.

我认为这种matlab格式是一种非常方便的存储和检索 numpy 数组的方式。真的很快磁盘和内存占用量完全相同。

Load / Save / Disk Comparison

加载/保存/磁盘比较

(image from mverleg benchmarks)

(图片来自mverleg 基准测试

But if for any reason you need to store the numpy arrays into SQLite I suggest to add some compression capabilities.

但是如果出于任何原因您需要将 numpy 数组存储到 SQLite 中,我建议添加一些压缩功能。

The extra lines from unutbucode is pretty simple

unutbu代码中的额外行非常简单

compressor = 'zlib'  # zlib, bz2

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    # zlib uses similar disk size that Matlab v5 .mat files
    # bz2 compress 4 times zlib, but storing process is 20 times slower.
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read().encode(compressor))  # zlib, bz2

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    out = io.BytesIO(out.read().decode(compressor))
    return np.load(out)

The results testing with MNIST database gives were:

使用 MNIST 数据库进行测试的结果是:

$ ./test_MNIST.py
[69900]:  99% remain: 0 secs   
Storing 70000 images in 379.9 secs
Retrieve 6990 images in 9.5 secs
$ ls -lh example.db 
-rw-r--r-- 1 agp agp 69M sep 22 07:27 example.db
$ ls -lh mnist-original.mat 
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat
```

using zlib, and

使用zlib, 和

$ ./test_MNIST.py
[69900]:  99% remain: 12 secs   
Storing 70000 images in 8536.2 secs
Retrieve 6990 images in 37.4 secs
$ ls -lh example.db 
-rw-r--r-- 1 agp agp 19M sep 22 03:33 example.db
$ ls -lh mnist-original.mat 
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat

using bz2

使用 bz2

Comparing Matlab V5format with bz2on SQLite, the bz2 compression is around 2.8, but the access time is quite long comparing to Matlab format (almost instantaneously vs more than 30 secs). Maybe is worthy only for really huge databases where the learning process is much time consuming than access time or where the database footprint is needed to be as small as possible.

与SQLite 上的Matlab V5格式相比bz2,bz2 压缩率约为 2.8,但与 Matlab 格式相比,访问时间相当长(几乎是瞬时的,超过 30 秒)。也许仅适用于真正庞大的数据库,其中学习过程比访问时间耗时更多,或者需要数据库占用空间尽可能小。

Finally note that bipz/zlibratio is around 3.7 and zlib/matlabrequires 30% more space.

最后请注意,bipz/zlib比率约为 3.7,zlib/matlab需要多出 30% 的空间。

The full code if you want to play yourself is:

如果你想自己玩,完整的代码是:

import sqlite3
import numpy as np
import io

compressor = 'zlib'  # zlib, bz2

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    # zlib uses similar disk size that Matlab v5 .mat files
    # bz2 compress 4 times zlib, but storing process is 20 times slower.
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read().encode(compressor))  # zlib, bz2

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    out = io.BytesIO(out.read().decode(compressor))
    return np.load(out)

sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)

dbname = 'example.db'
def test_save_sqlite_arrays():
    "Load MNIST database (70000 samples) and store in a compressed SQLite db"
    os.path.exists(dbname) and os.unlink(dbname)
    con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
    cur = con.cursor()
    cur.execute("create table test (idx integer primary key, X array, y integer );")

    mnist = fetch_mldata('MNIST original')

    X, y =  mnist.data, mnist.target
    m = X.shape[0]
    t0 = time.time()
    for i, x in enumerate(X):
        cur.execute("insert into test (idx, X, y) values (?,?,?)",
                    (i, y, int(y[i])))
        if not i % 100 and i > 0:
            elapsed = time.time() - t0
            remain = float(m - i) / i * elapsed
            print "\r[%5d]: %3d%% remain: %d secs" % (i, 100 * i / m, remain),
            sys.stdout.flush()

    con.commit()
    con.close()
    elapsed = time.time() - t0
    print
    print "Storing %d images in %0.1f secs" % (m, elapsed)

def test_load_sqlite_arrays():
    "Query MNIST SQLite database and load some samples"
    con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
    cur = con.cursor()

    # select all images labeled as '2'
    t0 = time.time()
    cur.execute('select idx, X, y from test where y = 2')
    data = cur.fetchall()
    elapsed = time.time() - t0
    print "Retrieve %d images in %0.1f secs" % (len(data), elapsed)


if __name__ == '__main__':
    test_save_sqlite_arrays()
    test_load_sqlite_arrays()

回答by gavin

The other methods specified didn't work for me. And well there seems to be a numpy.tobytesmethod now and a numpy.fromstring(which works on byte strings) but is deprecated and the recommended method is numpy.frombuffer.

指定的其他方法对我不起作用。好吧,现在似乎有一个numpy.tobytes方法和一个numpy.fromstring(适用于字节字符串)但已被弃用,推荐的方法是numpy.frombuffer.

import sqlite3
import numpy as np

sqlite3.register_adapter(np.array, adapt_array)    
sqlite3.register_converter("array", convert_array)

Coming to the meat and potatoes,

来到肉和土豆,

def adapt_array(arr):
    return arr.tobytes()

def convert_array(text):
    return np.frombuffer(text)

I've tested it in my application and it works well for me on Python 3.7.3and numpy 1.16.2

我在我的应用程序测试,它很适合我在Python 3.7.3numpy 1.16.2

numpy.fromstringgives the same outputs along with DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead

numpy.fromstring给出相同的输出以及 DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead