Python 从现有数据框的某些列创建新的熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45035929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating new pandas dataframe from certain columns of existing dataframe
提问by Sjoseph
I have read loaded a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:
我已将 csv 文件读取到 Pandas 数据帧中,并希望对数据帧进行一些简单的操作。我不知道如何根据原始数据框中的选定列创建新的数据框。我的尝试:
names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']
I would like to create a new dataframe with the columns A and D from the original dataframe.
我想用原始数据帧中的 A 列和 D 列创建一个新数据帧。
采纳答案by jezrael
It is called subset
- passed list of columns in []
:
它被称为subset
- 传递的列列表[]
:
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset[['A','D']]
what is same as:
什么是相同的:
new_dataset = dataset.loc[:, ['A','D']]
If need only filtered output add parameter usecols
to read_csv
:
如果只需要过滤输出添加参数usecols
到read_csv
:
new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])
EDIT:
编辑:
If use only:
如果仅使用:
new_dataset = dataset[['A','D']]
and use some data manipulation, obviously get:
并使用一些数据操作,显然得到:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
试图在来自 DataFrame 的切片副本上设置值。
尝试使用 .loc[row_indexer,col_indexer] = value 代替
If you modify values in new_dataset
later you will find that the modifications do not propagate back to the original data (dataset
), and that Pandas does warning.
如果new_dataset
稍后修改值,您会发现修改不会传播回原始数据 ( dataset
),并且 Pandas 会发出警告。
As pointed EdChumadd copy
for remove warning:
new_dataset = dataset[['A','D']].copy()