python Pandas 有 C/C++ API 吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11607387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:46:38  来源:igfitidea点击:

Is there a C/C++ API for python pandas?

pythoncapipandas

提问by THM

I'm extracting mass data from a legacy backend system using C/C++ and move it to Python using distutils. After obtaining the data in Python, I put it into a pandas DataFrame object for data analysis. Now I want to go faster and would like to avoid the second step.

我正在使用 C/C++ 从遗留后端系统中提取大量数据,并使用distutils. 在Python中获取数据后,我将其放入pandas DataFrame对象中进行数据分析。现在我想走得更快,想避免第二步。

Is there a C/C++ API for pandas to create a DataFrame in C/C++, add my C/C++ data and pass it to Python? I'm thinking of something that is similar to numpy C API.

是否有用于 Pandas 的 C/C++ API 在 C/C++ 中创建 DataFrame,添加我的 C/C++ 数据并将其传递给 Python?我在想一些类似于 numpy C API 的东西。

I already thougth of creating numpy array objects in C as a workaround but i'm heavily using timeseries data and would love to have the TimeSeries and date_range objects as well.

我已经想在 C 中创建 numpy 数组对象作为一种解决方法,但我大量使用时间序列数据,并且也希望拥有 TimeSeries 和 date_range 对象。

采纳答案by ecatmur

All the pandas classes (TimeSeries, DataFrame, DatetimeIndex etc.) have pure-Python definitions so there isn't a C API. You might be best off passing numpy ndarrays from C to your Python code and letting your Python code construct pandas objects from them.

所有 Pandas 类(TimeSeries、DataFrame、DatetimeIndex 等)都有纯 Python 定义,因此没有 C API。您最好将 numpy ndarrays 从 C 传递到您的 Python 代码,并让您的 Python 代码从它们构造 pandas 对象。

If necessary you could use PyObject_CallFunctionetc. to call the pandas constructors, but you'd have to take care of accessing the names from module imports and checking for errors.

如有必要,您可以使用PyObject_CallFunctionetc. 来调用 pandas 构造函数,但您必须注意从模块导入中访问名称并检查错误。

回答by Tomá? Gaven?iak

I am dealing with a similar problem, loading data from a format unsupported by Pandas with a C API. I found two ways to address this, hopefully someone might find them useful.

我正在处理类似的问题,使用 C API 从 Pandas 不支持的格式加载数据。我找到了两种方法来解决这个问题,希望有人会发现它们有用。

  • The Pandas data frames are pure Python classes, so they are not easy to construct from C/C++, but the underlying data storage of the individual columns (see class Series source)is numpy 1D array. Numpy has a nice C APIand you can construct the numpy arrays from C and then pass it to your Python code.

  • A second solution is to write your own input module for Pandas. This is not as much work as it sounds and might be very efficient. The Pandas low-level IO modules are written in Cython(a special language somewhere between Python and C, compiled to C), see e.g. parser.pyxfor an example. While that particular parser is quite involved, yours would basically just call your legacy C code.

  • Pandas 数据框是纯 Python 类,因此不容易从 C/C++ 构造它们,但是各个列的底层数据存储(参见类 Series 源)是 numpy 一维数组。Numpy 有一个很好的C API,您可以从 C 构造 numpy 数组,然后将其传递给您的 Python 代码。

  • 第二种解决方案是为 Pandas 编写自己的输入模块。这并不像听起来那么工作,并且可能非常有效。Pandas 的低级 IO 模块是用Cython(一种介于 Python 和 C 之间的特殊语言,编译为 C)编写的,示例参见parser.pyx。虽然该特定解析器相当复杂,但您的解析器基本上只会调用您的遗留 C 代码。

回答by hmoein

There is now a C++ library that is equivalent to Pandas package in terms of interface and functionality. See this article in Linkedin "https://www.linkedin.com/pulse/pythons-pandas-c-update-hossein-moein/" The open source code is in "https://github.com/hosseinmoein/DataFrame"

现在有一个 C++ 库,在接口和功能方面与 Pandas 包相当。见Linkedin这篇文章“ https://www.linkedin.com/pulse/pythons-pandas-c-update-hossein-moein/” 开源代码在“ https://github.com/hosseinmoein/DataFrame