Python Pandas 用户警告:排序,因为非串联轴未对齐
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50501787/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas User Warning: Sorting because non-concatenation axis is not aligned
提问by MishD
I'm doing some code practice and applying merging of data frames while doing this getting user warning
我正在做一些代码练习并应用数据帧的合并,同时这样做会收到用户警告
/usr/lib64/python2.7/site-packages/pandas/core/frame.py:6201: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=True'. To retain the current behavior and silence the warning, pass sort=False
/usr/lib64/python2.7/site-packages/pandas/core/frame.py:6201:FutureWarning:排序,因为非串联轴未对齐。未来版本的 Pandas 将更改为默认不排序。要接受未来的行为,请传递 'sort=True'。要保留当前行为并使警告静音,请传递 sort=False
On these lines of code: Can you please help to get the solution of this warning.
在这些代码行上:您能否帮助获得此警告的解决方案。
placement_video = [self.read_sql_vdx_summary, self.read_sql_video_km]
placement_video_summary = reduce(lambda left, right: pd.merge(left, right, on='PLACEMENT', sort=False), placement_video)
placement_by_video = placement_video_summary.loc[:, ["PLACEMENT", "PLACEMENT_NAME", "COST_TYPE", "PRODUCT",
"VIDEONAME", "VIEW0", "VIEW25", "VIEW50", "VIEW75",
"VIEW100",
"ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0",
"DPE25",
"DPE50", "DPE75", "DPE100"]]
# print (placement_by_video)
placement_by_video["Placement# Name"] = placement_by_video[["PLACEMENT",
"PLACEMENT_NAME"]].apply(lambda x: ".".join(x),
axis=1)
placement_by_video_new = placement_by_video.loc[:,
["PLACEMENT", "Placement# Name", "COST_TYPE", "PRODUCT", "VIDEONAME",
"VIEW0", "VIEW25", "VIEW50", "VIEW75", "VIEW100",
"ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0", "DPE25",
"DPE50", "DPE75", "DPE100"]]
placement_by_km_video = [placement_by_video_new, self.read_sql_km_for_video]
placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT'], sort=False),
placement_by_km_video)
#print (list(placement_by_km_video_summary))
#print(placement_by_km_video_summary)
#exit()
# print(placement_by_video_new)
"""Conditions for 25%view"""
mask17 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile'])
mask18 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPCV"])
mask19 = placement_by_km_video_summary["PRODUCT"].isin(["InStream"])
mask20 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+", "CPCV"])
mask_video_video_completions = placement_by_km_video_summary["COST_TYPE"].isin(["CPCV"])
mask21 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE+"])
mask22 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM"])
mask23 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile', 'InStream'])
mask24 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+"])
choice25video_eng = placement_by_km_video_summary["ENG25"]
choice25video_vwr = placement_by_km_video_summary["VIEW25"]
choice25video_deep = placement_by_km_video_summary["DPE25"]
placement_by_km_video_summary["25_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice25video_eng, choice25video_vwr, choice25video_deep])
"""Conditions for 50%view"""
choice50video_eng = placement_by_km_video_summary["ENG50"]
choice50video_vwr = placement_by_km_video_summary["VIEW50"]
choice50video_deep = placement_by_km_video_summary["DPE50"]
placement_by_km_video_summary["50_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice50video_eng,
choice50video_vwr, choice50video_deep])
"""Conditions for 75%view"""
choice75video_eng = placement_by_km_video_summary["ENG75"]
choice75video_vwr = placement_by_km_video_summary["VIEW75"]
choice75video_deep = placement_by_km_video_summary["DPE75"]
placement_by_km_video_summary["75_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice75video_eng,
choice75video_vwr,
choice75video_deep])
"""Conditions for 100%view"""
choice100video_eng = placement_by_km_video_summary["ENG100"]
choice100video_vwr = placement_by_km_video_summary["VIEW100"]
choice100video_deep = placement_by_km_video_summary["DPE100"]
choicecompletions = placement_by_km_video_summary['COMPLETIONS']
placement_by_km_video_summary["100_pc_video"] = np.select([mask17 & mask22, mask19 & mask24, mask17 & mask21, mask23 & mask_video_video_completions],
[choice100video_eng, choice100video_vwr, choice100video_deep, choicecompletions])
"""conditions for 0%view"""
choice0video_eng = placement_by_km_video_summary["ENG0"]
choice0video_vwr = placement_by_km_video_summary["VIEW0"]
choice0video_deep = placement_by_km_video_summary["DPE0"]
placement_by_km_video_summary["Views"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
[choice0video_eng,
choice0video_vwr,
choice0video_deep])
#print (placement_by_km_video_summary)
#exit()
#final Table
placement_by_video_summary = placement_by_km_video_summary.loc[:,
["PLACEMENT", "Placement# Name", "PRODUCT", "VIDEONAME", "COST_TYPE",
"Views", "25_pc_video", "50_pc_video", "75_pc_video","100_pc_video",
"ENGAGEMENTS","IMPRESSIONS", "DPEENGAMENTS"]]
#placement_by_km_video = [placement_by_video_summary, self.read_sql_km_for_video]
#placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT']),
#placement_by_km_video)
#print(placement_by_video_summary)
#exit()
# dup_col =["IMPRESSIONS","ENGAGEMENTS","DPEENGAMENTS"]
# placement_by_video_summary.loc[placement_by_video_summary.duplicated(dup_col),dup_col] = np.nan
# print ("Dhar",placement_by_video_summary)
'''adding views based on conditions'''
#filter maximum value from videos
placement_by_video_summary_new = placement_by_km_video_summary.loc[
placement_by_km_video_summary.reset_index().groupby(['PLACEMENT', 'PRODUCT'])['Views'].idxmax()]
#print (placement_by_video_summary_new)
#exit()
# print (placement_by_video_summary_new)
# mask22 = (placement_by_video_summary_new.PRODUCT.str.upper ()=='DISPLAY') & (placement_by_video_summary_new.COST_TYPE=='CPE')
placement_by_video_summary_new.loc[mask17 & mask18, 'Views'] = placement_by_video_summary_new['ENGAGEMENTS']
placement_by_video_summary_new.loc[mask19 & mask20, 'Views'] = placement_by_video_summary_new['IMPRESSIONS']
placement_by_video_summary_new.loc[mask17 & mask21, 'Views'] = placement_by_video_summary_new['DPEENGAMENTS']
#print (placement_by_video_summary_new)
#exit()
placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index).append(
placement_by_video_summary_new).sort_index()
placement_by_video_summary["Video Completion Rate"] = placement_by_video_summary["100_pc_video"] / \
placement_by_video_summary["Views"]
placement_by_video_final = placement_by_video_summary.loc[:,
["Placement# Name", "PRODUCT", "VIDEONAME", "Views",
"25_pc_video", "50_pc_video", "75_pc_video", "100_pc_video",
"Video Completion Rate"]]
回答by jezrael
tl;dr:
tl;博士:
concat
and append
currently sort the non-concatenation index (e.g. columns if you're adding rows) if the columns don't match. In pandas 0.23 this started generating a warning; pass the parameter sort=True
to silence it. In the future the default will change to notsort, so it's best to specify either sort=True
or False
now, or better yet ensure that your non-concatenation indices match.
concat
append
如果列不匹配,则当前对非串联索引(例如,如果要添加行则为列)进行排序。在 pandas 0.23 中,这开始产生警告;传递参数sort=True
使其静音。将来,默认值将更改为不排序,因此最好指定其中之一sort=True
或False
现在,或者更好地确保您的非串联索引匹配。
The warning is new in pandas 0.23.0:
警告是pandas 0.23.0 中的新警告:
In a future version of pandas pandas.concat()
and DataFrame.append()
will no longer sort the non-concatenation axis when it is not already aligned. The current behavior is the same as the previous (sorting), but now a warning is issued when sort is not specified and the non-concatenation axis is not aligned,
link.
在大熊猫的未来版本pandas.concat()
和DataFrame.append()
将不再这类非串列轴线时尚未对齐。当前行为与之前的(排序)相同,但现在在未指定 sort 且非串联轴未对齐时发出警告
link。
More information from linked very old github issue, comment by smcinerney :
来自链接的非常古老的github 问题的更多信息,smcinerney 评论:
When concat'ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they're identical across DataFrames, they don't get sorted.
This sort is undocumented and unwanted. Certainly the default behavior should be no-sort.
连接 DataFrame 时,如果列名之间有任何差异,它们将按字母数字顺序排序。如果它们在 DataFrames 中相同,则它们不会被排序。
这种类型是无证和不需要的。当然,默认行为应该是 no-sort。
After some time the parameter sort
was implemented in pandas.concat
and DataFrame.append
:
一段时间后,参数sort
在pandas.concat
和中实现DataFrame.append
:
sort: boolean, default None
Sort non-concatenation axis if it is not already aligned when join is 'outer'. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.
Explicitly pass sort=True to silence the warning and sort. Explicitly pass sort=False to silence the warning and not sort.
This has no effect when join='inner', which already preserves the order of the non-concatenation axis.
排序:布尔值,默认无
如果在 join 为“外部”时尚未对齐,则对非串联轴进行排序。当前默认的排序已被弃用,并将在未来版本的 Pandas 中更改为不排序。
显式传递 sort=True 以消除警告和排序。显式传递 sort=False 以消除警告而不是排序。
这在 join='inner' 时不起作用,它已经保留了非串联轴的顺序。
So if both DataFrames have the same columns in the same order, there is no warning and no sorting:
因此,如果两个 DataFrame 具有相同顺序的相同列,则不会出现警告和排序:
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['a', 'b'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])
print (pd.concat([df1, df2]))
a b
0 1 0
1 2 8
0 4 7
1 5 3
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['b', 'a'])
print (pd.concat([df1, df2]))
b a
0 0 1
1 8 2
0 7 4
1 3 5
But if the DataFrames have different columns, or the same columns in a different order, pandas returns a warning if no parameter sort
is explicitly set (sort=None
is the default value):
但是如果 DataFrames 有不同的列,或者相同的列以不同的顺序,如果没有sort
明确设置参数(sort=None
是默认值),pandas 会返回警告:
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])
print (pd.concat([df1, df2]))
FutureWarning: Sorting because non-concatenation axis is not aligned.
FutureWarning:排序,因为非串联轴未对齐。
a b
0 1 0
1 2 8
0 4 7
1 5 3
print (pd.concat([df1, df2], sort=True))
a b
0 1 0
1 2 8
0 4 7
1 5 3
print (pd.concat([df1, df2], sort=False))
b a
0 0 1
1 8 2
0 7 4
1 3 5
If the DataFrames have different columns, but the first columns are aligned - they will be correctly assigned to each other (columns a
and b
from df1
with a
and b
from df2
in the example below) because they exist in both. For other columns that exist in one but not both DataFrames, missing values are created.
如果 DataFrames 有不同的列,但第一列是对齐的 - 它们将被正确分配给彼此(列a
和b
from df1
witha
和b
fromdf2
在下面的示例中),因为它们同时存在。对于存在于一个而非两个 DataFrame 中的其他列,将创建缺失值。
Lastly, if you pass sort=True
, columns are sorted alphanumerically. If sort=False
and the second DafaFrame has columns that are not in the first, they are appended to the end with no sorting:
最后,如果你通过sort=True
,列按字母数字排序。如果sort=False
第二个 DafaFrame 有不在第一个中的列,则将它们附加到末尾而不进行排序:
df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8], 'e':[5, 0]},
columns=['b', 'a','e'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3], 'c':[2, 8], 'd':[7, 0]},
columns=['c','b','a','d'])
print (pd.concat([df1, df2]))
FutureWarning: Sorting because non-concatenation axis is not aligned.
FutureWarning:排序,因为非串联轴未对齐。
a b c d e
0 1 0 NaN NaN 5.0
1 2 8 NaN NaN 0.0
0 4 7 2.0 7.0 NaN
1 5 3 8.0 0.0 NaN
print (pd.concat([df1, df2], sort=True))
a b c d e
0 1 0 NaN NaN 5.0
1 2 8 NaN NaN 0.0
0 4 7 2.0 7.0 NaN
1 5 3 8.0 0.0 NaN
print (pd.concat([df1, df2], sort=False))
b a e c d
0 0 1 5.0 NaN NaN
1 8 2 0.0 NaN NaN
0 7 4 NaN 2.0 7.0
1 3 5 NaN 8.0 0.0
In your code:
在您的代码中:
placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index)
.append(placement_by_video_summary_new, sort=True)
.sort_index()
回答by RLC
jezrael's answer is good, but did not answer a question I had: Will getting the "sort" flag wrong mess up my data in any way? The answer is apparently "no", you are fine either way.
jezrael 的回答很好,但没有回答我的问题:“排序”标志错误是否会以任何方式弄乱我的数据?答案显然是“不”,无论哪种方式你都很好。
from pandas import DataFrame, concat
a = DataFrame([{'a':1, 'c':2,'d':3 }])
b = DataFrame([{'a':4,'b':5, 'd':6,'e':7}])
>>> concat([a,b],sort=False)
a c d b e
0 1 2.0 3 NaN NaN
0 4 NaN 6 5.0 7.0
>>> concat([a,b],sort=True)
a b c d e
0 1 NaN 2.0 3 NaN
0 4 5.0 NaN 6 7.0