pandas 从现有的熊猫数据框中复制一些行到一个新的
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42483959/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
copy some rows from existing pandas dataframe to a new one
提问by kakoli
And the copy has to be done for 'City' column starting with 'BH'. The copied df.index shouls be same as the original Eg -
并且必须为以“BH”开头的“城市”列进行复制。复制的 df.index 应该与原始的 Eg 相同 -
STATE CITY
315 KA BLR
423 WB CCU
554 KA BHU
557 TN BHY
# state_df is new dataframe, df is existing
state_df = pd.DataFrame(columns=['STATE', 'CITY'])
for index, row in df.iterrows():
city = row['CITY']
if(city.startswith('BH')):
append row from df to state_df # pseudocode
Being new to pandas and Python, I need help in the pseudocode for the most efficient way.
作为 Pandas 和 Python 的新手,我需要伪代码方面的帮助以获得最有效的方式。
采纳答案by jezrael
Solution with startswith
and boolean indexing
:
解决方案startswith
和boolean indexing
:
print (df['CITY'].str.startswith('BH'))
315 False
423 False
554 True
557 True
state_df = df[df['CITY'].str.startswith('BH')]
print (state_df)
STATE CITY
554 KA BHU
557 TN BHY
If need copy only some columns add loc
:
如果只需要复制一些列添加loc
:
state_df = df.loc[df['CITY'].str.startswith('BH'), ['STATE']]
print (state_df)
STATE
554 KA
557 TN
Timings:
时间:
#len (df) = 400k
df = pd.concat([df]*100000).reset_index(drop=True)
In [111]: %timeit (df.CITY.str.startswith('BH'))
10 loops, best of 3: 151 ms per loop
In [112]: %timeit (df.CITY.str.contains('^BH'))
1 loop, best of 3: 254 ms per loop
回答by MaxU
try this:
尝试这个:
In [4]: new = df[df['CITY'].str.contains(r'^BH')].copy()
In [5]: new
Out[5]:
STATE CITY
554 KA BHU
557 TN BHY
What if I need to copy only some columns of the row and not the entire row
如果我只需要复制行的某些列而不是整行怎么办
cols_to_copy = ['STATE']
new = df.loc[df.CITY.str.contains(r'^BH'), cols_to_copy].copy()
In [7]: new
Out[7]:
STATE
554 KA
557 TN
回答by kakoli
Removed the for loop and finally wrote this : state_df = df.loc[df['CTYNAME'].str.startswith('Washington'), cols_to_copy]
删除了 for 循环,最后写了这个: state_df = df.loc[df['CTYNAME'].str.startswith('Washington'), cols_to_copy]
For loop may be slower, but need to check on that
For 循环可能更慢,但需要检查一下