Python Pandas 用户警告：排序，因为非串联轴未对齐

Question

提问by MishD

I'm doing some code practice and applying merging of data frames while doing this getting user warning

我正在做一些代码练习并应用数据帧的合并，同时这样做会收到用户警告

/usr/lib64/python2.7/site-packages/pandas/core/frame.py:6201: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=True'. To retain the current behavior and silence the warning, pass sort=False

/usr/lib64/python2.7/site-packages/pandas/core/frame.py:6201：FutureWarning：排序，因为非串联轴未对齐。未来版本的 Pandas 将更改为默认不排序。要接受未来的行为，请传递 'sort=True'。要保留当前行为并使警告静音，请传递 sort=False

On these lines of code: Can you please help to get the solution of this warning.

在这些代码行上：您能否帮助获得此警告的解决方案。

placement_video = [self.read_sql_vdx_summary, self.read_sql_video_km]
placement_video_summary = reduce(lambda left, right: pd.merge(left, right, on='PLACEMENT', sort=False), placement_video)


placement_by_video = placement_video_summary.loc[:, ["PLACEMENT", "PLACEMENT_NAME", "COST_TYPE", "PRODUCT",
                                                     "VIDEONAME", "VIEW0", "VIEW25", "VIEW50", "VIEW75",
                                                     "VIEW100",
                                                     "ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0",
                                                     "DPE25",
                                                     "DPE50", "DPE75", "DPE100"]]

# print (placement_by_video)

placement_by_video["Placement# Name"] = placement_by_video[["PLACEMENT",
                                                            "PLACEMENT_NAME"]].apply(lambda x: ".".join(x),
                                                                                     axis=1)

placement_by_video_new = placement_by_video.loc[:,
                         ["PLACEMENT", "Placement# Name", "COST_TYPE", "PRODUCT", "VIDEONAME",
                          "VIEW0", "VIEW25", "VIEW50", "VIEW75", "VIEW100",
                          "ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0", "DPE25",
                          "DPE50", "DPE75", "DPE100"]]

placement_by_km_video = [placement_by_video_new, self.read_sql_km_for_video]
placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT'], sort=False),
                                       placement_by_km_video)

#print (list(placement_by_km_video_summary))
#print(placement_by_km_video_summary)
#exit()
# print(placement_by_video_new)
"""Conditions for 25%view"""
mask17 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile'])
mask18 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPCV"])
mask19 = placement_by_km_video_summary["PRODUCT"].isin(["InStream"])
mask20 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+", "CPCV"])
mask_video_video_completions = placement_by_km_video_summary["COST_TYPE"].isin(["CPCV"])
mask21 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE+"])
mask22 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM"])
mask23 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile', 'InStream'])
mask24 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+"])

choice25video_eng = placement_by_km_video_summary["ENG25"]
choice25video_vwr = placement_by_km_video_summary["VIEW25"]
choice25video_deep = placement_by_km_video_summary["DPE25"]

placement_by_km_video_summary["25_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                  [choice25video_eng, choice25video_vwr, choice25video_deep])


"""Conditions for 50%view"""
choice50video_eng = placement_by_km_video_summary["ENG50"]
choice50video_vwr = placement_by_km_video_summary["VIEW50"]
choice50video_deep = placement_by_km_video_summary["DPE50"]

placement_by_km_video_summary["50_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                  [choice50video_eng,
                                                   choice50video_vwr, choice50video_deep])

"""Conditions for 75%view"""

choice75video_eng = placement_by_km_video_summary["ENG75"]
choice75video_vwr = placement_by_km_video_summary["VIEW75"]
choice75video_deep = placement_by_km_video_summary["DPE75"]

placement_by_km_video_summary["75_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                  [choice75video_eng,
                                                   choice75video_vwr,
                                                   choice75video_deep])

"""Conditions for 100%view"""

choice100video_eng = placement_by_km_video_summary["ENG100"]
choice100video_vwr = placement_by_km_video_summary["VIEW100"]
choice100video_deep = placement_by_km_video_summary["DPE100"]
choicecompletions = placement_by_km_video_summary['COMPLETIONS']

placement_by_km_video_summary["100_pc_video"] = np.select([mask17 & mask22, mask19 & mask24, mask17 & mask21, mask23 & mask_video_video_completions],
                                                          [choice100video_eng, choice100video_vwr, choice100video_deep, choicecompletions])



"""conditions for 0%view"""

choice0video_eng = placement_by_km_video_summary["ENG0"]
choice0video_vwr = placement_by_km_video_summary["VIEW0"]
choice0video_deep = placement_by_km_video_summary["DPE0"]

placement_by_km_video_summary["Views"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                   [choice0video_eng,
                                                    choice0video_vwr,
                                                    choice0video_deep])


#print (placement_by_km_video_summary)
#exit()

#final Table

placement_by_video_summary = placement_by_km_video_summary.loc[:,
                             ["PLACEMENT", "Placement# Name", "PRODUCT", "VIDEONAME", "COST_TYPE",
                              "Views", "25_pc_video", "50_pc_video", "75_pc_video","100_pc_video",
                              "ENGAGEMENTS","IMPRESSIONS", "DPEENGAMENTS"]]

#placement_by_km_video = [placement_by_video_summary, self.read_sql_km_for_video]
#placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT']),
                                       #placement_by_km_video)


#print(placement_by_video_summary)
#exit()
# dup_col =["IMPRESSIONS","ENGAGEMENTS","DPEENGAMENTS"]

# placement_by_video_summary.loc[placement_by_video_summary.duplicated(dup_col),dup_col] = np.nan

# print ("Dhar",placement_by_video_summary)

'''adding views based on conditions'''
#filter maximum value from videos

placement_by_video_summary_new = placement_by_km_video_summary.loc[
    placement_by_km_video_summary.reset_index().groupby(['PLACEMENT', 'PRODUCT'])['Views'].idxmax()]
#print (placement_by_video_summary_new)
#exit()
# print (placement_by_video_summary_new)
# mask22 = (placement_by_video_summary_new.PRODUCT.str.upper ()=='DISPLAY') & (placement_by_video_summary_new.COST_TYPE=='CPE')

placement_by_video_summary_new.loc[mask17 & mask18, 'Views'] = placement_by_video_summary_new['ENGAGEMENTS']
placement_by_video_summary_new.loc[mask19 & mask20, 'Views'] = placement_by_video_summary_new['IMPRESSIONS']
placement_by_video_summary_new.loc[mask17 & mask21, 'Views'] = placement_by_video_summary_new['DPEENGAMENTS']

#print (placement_by_video_summary_new)
#exit()
placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index).append(
    placement_by_video_summary_new).sort_index()

placement_by_video_summary["Video Completion Rate"] = placement_by_video_summary["100_pc_video"] / \
                                                      placement_by_video_summary["Views"]

placement_by_video_final = placement_by_video_summary.loc[:,
                           ["Placement# Name", "PRODUCT", "VIDEONAME", "Views",
                            "25_pc_video", "50_pc_video", "75_pc_video", "100_pc_video",
                            "Video Completion Rate"]]

Answer 1

回答by jezrael

tl;dr:

tl;博士：

concatand appendcurrently sort the non-concatenation index (e.g. columns if you're adding rows) if the columns don't match. In pandas 0.23 this started generating a warning; pass the parameter sort=Trueto silence it. In the future the default will change to notsort, so it's best to specify either sort=Trueor Falsenow, or better yet ensure that your non-concatenation indices match.

concatappend如果列不匹配，则当前对非串联索引（例如，如果要添加行则为列）进行排序。在 pandas 0.23 中，这开始产生警告；传递参数sort=True使其静音。将来，默认值将更改为不排序，因此最好指定其中之一sort=True或False现在，或者更好地确保您的非串联索引匹配。

The warning is new in pandas 0.23.0:

警告是pandas 0.23.0 中的新警告：

In a future version of pandas pandas.concat()and DataFrame.append()will no longer sort the non-concatenation axis when it is not already aligned. The current behavior is the same as the previous (sorting), but now a warning is issued when sort is not specified and the non-concatenation axis is not aligned, link.

在大熊猫的未来版本pandas.concat()和DataFrame.append()将不再这类非串列轴线时尚未对齐。当前行为与之前的（排序）相同，但现在在未指定 sort 且非串联轴未对齐时发出警告 link。

More information from linked very old github issue, comment by smcinerney :

来自链接的非常古老的github 问题的更多信息，smcinerney 评论：

When concat'ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they're identical across DataFrames, they don't get sorted.
This sort is undocumented and unwanted. Certainly the default behavior should be no-sort.

连接 DataFrame 时，如果列名之间有任何差异，它们将按字母数字顺序排序。如果它们在 DataFrames 中相同，则它们不会被排序。
这种类型是无证和不需要的。当然，默认行为应该是 no-sort。

After some time the parameter sortwas implemented in pandas.concatand DataFrame.append:

一段时间后，参数sort在pandas.concat和中实现DataFrame.append：

sort: boolean, default None
Sort non-concatenation axis if it is not already aligned when join is 'outer'. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.
Explicitly pass sort=True to silence the warning and sort. Explicitly pass sort=False to silence the warning and not sort.
This has no effect when join='inner', which already preserves the order of the non-concatenation axis.

排序：布尔值，默认无
如果在 join 为“外部”时尚未对齐，则对非串联轴进行排序。当前默认的排序已被弃用，并将在未来版本的 Pandas 中更改为不排序。
显式传递 sort=True 以消除警告和排序。显式传递 sort=False 以消除警告而不是排序。
这在 join='inner' 时不起作用，它已经保留了非串联轴的顺序。

So if both DataFrames have the same columns in the same order, there is no warning and no sorting:

因此，如果两个 DataFrame 具有相同顺序的相同列，则不会出现警告和排序：

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['a', 'b'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])

print (pd.concat([df1, df2]))
   a  b
0  1  0
1  2  8
0  4  7
1  5  3

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['b', 'a'])

print (pd.concat([df1, df2]))
   b  a
0  0  1
1  8  2
0  7  4
1  3  5

But if the DataFrames have different columns, or the same columns in a different order, pandas returns a warning if no parameter sortis explicitly set (sort=Noneis the default value):

但是如果 DataFrames 有不同的列，或者相同的列以不同的顺序，如果没有sort明确设置参数（sort=None是默认值），pandas 会返回警告：

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])

print (pd.concat([df1, df2]))

FutureWarning: Sorting because non-concatenation axis is not aligned.

FutureWarning：排序，因为非串联轴未对齐。

   a  b
0  1  0
1  2  8
0  4  7
1  5  3

print (pd.concat([df1, df2], sort=True))
   a  b
0  1  0
1  2  8
0  4  7
1  5  3

print (pd.concat([df1, df2], sort=False))
   b  a
0  0  1
1  8  2
0  7  4
1  3  5

If the DataFrames have different columns, but the first columns are aligned - they will be correctly assigned to each other (columns aand bfrom df1with aand bfrom df2in the example below) because they exist in both. For other columns that exist in one but not both DataFrames, missing values are created.

如果 DataFrames 有不同的列，但第一列是对齐的 - 它们将被正确分配给彼此（列a和bfrom df1witha和bfromdf2在下面的示例中），因为它们同时存在。对于存在于一个而非两个 DataFrame 中的其他列，将创建缺失值。

Lastly, if you pass sort=True, columns are sorted alphanumerically. If sort=Falseand the second DafaFrame has columns that are not in the first, they are appended to the end with no sorting:

最后，如果你通过sort=True，列按字母数字排序。如果sort=False第二个 DafaFrame 有不在第一个中的列，则将它们附加到末尾而不进行排序：

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8], 'e':[5, 0]}, 
                    columns=['b', 'a','e'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3], 'c':[2, 8], 'd':[7, 0]}, 
                    columns=['c','b','a','d'])

print (pd.concat([df1, df2]))

FutureWarning: Sorting because non-concatenation axis is not aligned.

FutureWarning：排序，因为非串联轴未对齐。

   a  b    c    d    e
0  1  0  NaN  NaN  5.0
1  2  8  NaN  NaN  0.0
0  4  7  2.0  7.0  NaN
1  5  3  8.0  0.0  NaN

print (pd.concat([df1, df2], sort=True))
   a  b    c    d    e
0  1  0  NaN  NaN  5.0
1  2  8  NaN  NaN  0.0
0  4  7  2.0  7.0  NaN
1  5  3  8.0  0.0  NaN

print (pd.concat([df1, df2], sort=False))

   b  a    e    c    d
0  0  1  5.0  NaN  NaN
1  8  2  0.0  NaN  NaN
0  7  4  NaN  2.0  7.0
1  3  5  NaN  8.0  0.0

In your code:

在您的代码中：

placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index)
                                                       .append(placement_by_video_summary_new, sort=True)
                                                       .sort_index()

Answer 2

回答by RLC

jezrael's answer is good, but did not answer a question I had: Will getting the "sort" flag wrong mess up my data in any way? The answer is apparently "no", you are fine either way.

jezrael 的回答很好，但没有回答我的问题：“排序”标志错误是否会以任何方式弄乱我的数据？答案显然是“不”，无论哪种方式你都很好。

from pandas import DataFrame, concat

a = DataFrame([{'a':1,      'c':2,'d':3      }])
b = DataFrame([{'a':4,'b':5,      'd':6,'e':7}])

>>> concat([a,b],sort=False)
   a    c  d    b    e
0  1  2.0  3  NaN  NaN
0  4  NaN  6  5.0  7.0

>>> concat([a,b],sort=True)
   a    b    c  d    e
0  1  NaN  2.0  3  NaN
0  4  5.0  NaN  6  7.0

Python Pandas 用户警告：排序，因为非串联轴未对齐

提问by MishD

回答by jezrael

回答by RLC

相关推荐

最近更新

标签

Python Pandas 用户警告：排序，因为非串联轴未对齐

提问by MishD

回答by jezrael

回答by RLC

相关推荐

Python Tensorflow 分配内存：38535168 的分配超过系统内存的 10%

Python ValueError：无法将大小为 50176 的数组重塑为形状 (1,224,224,3)

Python 如何找到真实数据的概率分布和参数？（蟒蛇 3）

Python 将 Pandas 数据帧转换为 PyTorch 张量？

相关推荐

最近更新

标签