Python pandas 数据框警告,建议使用 .loc 代替?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29263506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas data frame warning, suggest to use .loc instead?
提问by KubiK888
Hi I would like to manipulate the data by removing missing information and make all letters lower case. But for the lowercase conversion, I get this warning:
嗨,我想通过删除丢失的信息并使所有字母小写来操作数据。但是对于小写转换,我收到此警告:
E:\Program Files Extra\Python27\lib\site-packages\pandas\core\frame.py:1808: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
"DataFrame index.", UserWarning)
C:\Users\KubiK\Desktop\FamSeach_NameHandling.py:18: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copyframe3["name"] = frame3["name"].str.lower()
请参阅文档中的警告:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copyframe3["name"] = frame3["name"].str.lower()
C:\Users\KubiK\Desktop\FamSeach_NameHandling.py:19: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copyframe3["ethnicity"] = frame3["ethnicity"].str.lower()
请参阅文档中的警告:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copyframe3["ethnicity"] = frame3["ethnicity"].str.lower()
import pandas as pd
from pandas import DataFrame
# Get csv file into data frame
data = pd.read_csv("C:\Users\KubiK\Desktop\OddNames_sampleData.csv")
frame = DataFrame(data)
frame.columns = ["name", "ethnicity"]
name = frame.name
ethnicity = frame.ethnicity
# Remove missing ethnicity data cases
index_missEthnic = frame.ethnicity.isnull()
index_missName = frame.name.isnull()
frame2 = frame[index_missEthnic != True]
frame3 = frame2[index_missName != True]
# Make all letters into lowercase
frame3["name"] = frame3["name"].str.lower()
frame3["ethnicity"] = frame3["ethnicity"].str.lower()
# Test outputs
print frame3
This warning doesn't seem to be fatal (at least for my small sample data), but how should I deal with this?
这个警告似乎不是致命的(至少对于我的小样本数据),但是我应该如何处理?
Sample data
样本数据
Name Ethnicity
Thos C. Martin Russian
Charlotte Wing English
Frederick A T Byrne Canadian
J George Christe French
Mary R O'brien English
Marie A Savoie-dit Dugas English
J-b'te Letourneau Scotish
Jane Mc-earthar French
Amabil?? Bonneau English
Emma Lef??c French
C., Akeefe African
D, James Matheson English
Marie An: Thomas English
Susan Rrumb;u English
English
Kaio Chan
采纳答案by Alexander
When you set frame2/3, trying using .loc as follows:
当您设置 frame2/3 时,尝试使用 .loc 如下:
frame2 = frame.loc[~index_missEthnic, :]
frame3 = frame2.loc[~index_missName, :]
I think this would fix the error you're seeing:
我认为这可以解决您看到的错误:
frame3.loc[:, "name"] = frame3.loc[:, "name"].str.lower()
frame3.loc[:, "ethnicity"] = frame3.loc[:, "ethnicity"].str.lower()
You can also try the following, although it doesn't answer your question:
您也可以尝试以下操作,尽管它不能回答您的问题:
frame3.loc[:, "name"] = [t.lower() if isinstance(t, str) else t for t in frame3.name]
frame3.loc[:, "ethnicity"] = [t.lower() if isinstance(t, str) else t for t in frame3. ethnicity]
This converts any string in the column into lowercase, otherwise it leaves the value untouched.
这会将列中的任何字符串转换为小写,否则不会更改该值。
回答by Primer
Not sure why do you need so many booleans...
Also note that .isnull()does not catch empty strings.
And filtering empty string before applying .lower()doesn't seems neccessary either.
But it there is a need... This works for me:
不知道为什么你需要这么多布尔值......还要注意,.isnull()不会捕获空字符串。在应用之前过滤空字符串.lower()似乎也没有必要。但它有需要......这对我有用:
frame = pd.DataFrame({'name':['Abc Def', 'EFG GH', ''], 'ethnicity':['Ethnicity1','', 'Ethnicity2']})
print frame
ethnicity name
0 Ethnicity1 Abc Def
1 EFG GH
2 Ethnicity2
name_null = frame.name.str.len() == 0
frame.loc[~name_null, 'name'] = frame.loc[~name_null, 'name'].str.lower()
print frame
ethnicity name
0 Ethnicity1 abc def
1 efg gh
2 Ethnicity2

