Python 什么相当于 Matlab 元胞数组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40609838/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the equivalent to a Matlab cell array?
提问by ikonikon
I am new to Python and trying to create something equivalent to Matlab's "cell array". Lets say I have 100 customers index 'C001', 'C002' etc. and I have different data for each customer:
我是 Python 的新手,并试图创建与 Matlab 的“单元格数组”等效的东西。假设我有 100 个客户索引“C001”、“C002”等,并且每个客户都有不同的数据:
- Size of premises in square meters [real number]
- categorical data showing whether they are 'commercial', 'residential' or 'other'
- hourly time series of their electricity consumption in 2014 i.e. datetime-indexed array of 8760 real values
- 房屋面积(平方米)[实数]
- 显示它们是“商业”、“住宅”还是“其他”的分类数据
- 2014 年电力消耗的每小时时间序列,即 8760 个实际值的日期时间索引数组
What is the best way to buildsuch a dataset in Python 2.7 that combines single values, categorical data and time-index arrays? I am trying to use pandas for this but no success so far.
在 Python 2.7 中构建这样一个结合了单个值、分类数据和时间索引数组的数据集的最佳方法是什么?我正在尝试为此使用熊猫,但到目前为止还没有成功。
Thank you very much in advance
非常感谢您提前
回答by TheBlackCat
The equivalent of a MATLAB cell array is a numpy object array. However, these are rarely used because they are rarely what you want in practice. In most cases where someone would use a Cell in MATLAB, a list or nested list would suffice:
MATLAB 元胞数组的等价物是一个 numpy 对象数组。但是,这些很少使用,因为它们很少是您在实践中想要的。在大多数情况下,有人会在 MATLAB 中使用单元格,列表或嵌套列表就足够了:
>>> a = [obj1, obj2, obj, obj4]
>>> b = [[obj1, obj2], [obj3, obj4]]
However, that is not what you want to do in your case. Your question is a classic example of X Y problem. You are asking how implement a particular solution to your problem, rather than asking how to solve the problem itself. Python can do a lot of things MATLAB can't, so trying to make Python behave like MATLAB will often result in sub-optimal solutions.
但是,在您的情况下,这不是您想要做的。您的问题是XY 问题的经典示例。您是在问如何为您的问题实施特定的解决方案,而不是在问如何解决问题本身。Python 可以做很多 MATLAB 不能做的事情,因此试图让 Python 表现得像 MATLAB 通常会导致次优解决方案。
In this case, what you want is a pandas DataFrame. It is nothing at all like a MATLAB cell array, but fits your data set much better. You can use a MultiIndexto store the parameters, and columns to store the time series data. This allows you to index by name, size, category, date, etc. You can calculate, for example, the mean energy usage for each category of property in the third quarter for properties over 500 square meters in just one line of code.
在这种情况下,您想要的是一个pandas DataFrame。它完全不像 MATLAB 元胞数组,但更适合您的数据集。您可以使用MultiIndex来存储参数,并使用列来存储时间序列数据。这使您可以按名称、大小、类别、日期等进行索引。例如,您只需一行代码即可计算出超过 500 平方米的房产在第三季度中每个类别的平均能源使用量。
So here is an example how you could structure such data:
因此,这是一个如何构建此类数据的示例:
>>> names = ['C001', 'C002', 'C003', 'C004']
>>> sizes = np.abs(np.random.random(4))*1000
>>> category = ['Commerical', 'Residential', 'Residential', 'Other']
>>> ts = np.random.random([100, 4])
>>> timestamps = pd.date_range('1/1/2011', periods=100, freq='W')
>>>
>>> cols = pd.MultiIndex.from_arrays([names, sizes, category])
>>>
>>> df = pd.DataFrame(ts, index=timestamps, columns=cols)
>>> df.columns.names = ['Name', 'Size', 'Category']
>>> df.index.name = 'Time'
>>>
>>> print(df)
Name C001 C002 C003 C004
Size 36.719201 732.278278 795.755755 551.383120
Category Commerical Residential Residential Other
Time
2011-01-02 0.108720 0.018492 0.057233 0.694548
2011-01-09 0.959845 0.968857 0.422210 0.975767
2011-01-16 0.709676 0.119963 0.004481 0.830328
2011-01-23 0.084271 0.535408 0.209943 0.668001
2011-01-30 0.626125 0.052301 0.212636 0.995429
2011-02-06 0.376399 0.199327 0.482884 0.632472
2011-02-13 0.302807 0.353679 0.599427 0.993996
2011-02-20 0.185445 0.005769 0.755981 0.923540
2011-02-27 0.109611 0.994292 0.873782 0.542741
2011-03-06 0.561404 0.778414 0.595238 0.082001
2011-03-13 0.056986 0.869344 0.459753 0.450071
2011-03-20 0.261320 0.675317 0.603043 0.371950
2011-03-27 0.890803 0.061619 0.831677 0.801890
2011-04-03 0.498199 0.846559 0.370336 0.225477
2011-04-10 0.248914 0.693038 0.145255 0.233058
2011-04-17 0.621441 0.683213 0.048944 0.650139
2011-04-24 0.459869 0.055751 0.912097 0.457605
2011-05-01 0.814447 0.780415 0.184241 0.429139
2011-05-08 0.586905 0.209121 0.428080 0.246584
2011-05-15 0.754021 0.909181 0.846984 0.948835
2011-05-22 0.513610 0.203925 0.338072 0.596325
2011-05-29 0.497080 0.557908 0.916812 0.680242
2011-06-05 0.646791 0.641024 0.399427 0.308346
2011-06-12 0.573922 0.539285 0.098703 0.461480
2011-06-19 0.062978 0.939339 0.713087 0.380326
2011-06-26 0.422484 0.109185 0.459734 0.800468
2011-07-03 0.962368 0.632361 0.388565 0.503425
2011-07-10 0.802551 0.261161 0.590494 0.526307
2011-07-17 0.261447 0.686405 0.636970 0.622476
2011-07-24 0.634331 0.630028 0.069925 0.504036
... ... ... ... ...
2012-05-06 0.185331 0.375717 0.658463 0.697377
2012-05-13 0.273510 0.665318 0.756944 0.083542
2012-05-20 0.895984 0.850881 0.680869 0.987420
2012-05-27 0.450593 0.262195 0.458893 0.199141
2012-06-03 0.696102 0.332312 0.419764 0.338074
2012-06-10 0.113108 0.167605 0.812625 0.329429
2012-06-17 0.527418 0.087454 0.868973 0.744649
2012-06-24 0.977674 0.831538 0.410719 0.598423
2012-07-01 0.577802 0.141307 0.310356 0.276271
2012-07-08 0.772117 0.288240 0.820701 0.548857
2012-07-15 0.699628 0.467952 0.429433 0.304482
2012-07-22 0.782641 0.337854 0.561191 0.572241
2012-07-29 0.010225 0.962770 0.793041 0.166877
2012-08-05 0.895516 0.628526 0.782264 0.908301
2012-08-12 0.787210 0.698185 0.255306 0.741693
2012-08-19 0.042833 0.556469 0.165885 0.408108
2012-08-26 0.942076 0.377714 0.927170 0.119004
2012-09-02 0.567978 0.007891 0.777752 0.869950
2012-09-09 0.120134 0.417996 0.328654 0.484447
2012-09-16 0.833769 0.946456 0.594471 0.569707
2012-09-23 0.515544 0.090017 0.344200 0.498175
2012-09-30 0.419152 0.315412 0.683195 0.498630
2012-10-07 0.879582 0.958591 0.531812 0.051948
2012-10-14 0.488241 0.683242 0.096560 0.197295
2012-10-21 0.425213 0.279539 0.476436 0.492512
2012-10-28 0.238334 0.836782 0.901589 0.132700
2012-11-04 0.030562 0.797666 0.238895 0.550427
2012-11-11 0.875454 0.973046 0.457116 0.154175
2012-11-18 0.557967 0.895320 0.478239 0.448102
2012-11-25 0.075152 0.047344 0.650615 0.293129
[100 rows x 4 columns]