Python 禁用索引熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18290123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
disable index pandas data frame
提问by GeauxEric
How can I drop or disable the indices in a pandas Data Frame?
如何删除或禁用熊猫数据框中的索引?
I am learning the pandas from the book "python for data analysis" and I already know I can use the dataframe.drop to drop one column or one row. But I did not find anything about disabling the all the indices in place.
我正在从“python for data analysis”一书中学习熊猫,我已经知道我可以使用 dataframe.drop 删除一列或一行。但是我没有找到任何关于禁用所有索引的信息。
采纳答案by Viktor Kerkez
df.values
gives you the raw NumPy ndarray
without the indexes.
df.values
为您提供ndarray
没有索引的原始 NumPy 。
>>> df
x y
0 4 GE
1 1 RE
2 1 AE
3 4 CD
>>> df.values
array([[4, 'GE'],
[1, 'RE'],
[1, 'AE'],
[4, 'CD']], dtype=object)
You cannot have a DataFrame without the indexes, they are the whole point of the DataFrame :)
没有索引就不能拥有 DataFrame,它们是 DataFrame 的重点:)
But just to be clear, this operation is not inplace:
但要明确的是,此操作不是到位的:
>>> df.values is df.values
False
DataFrame keeps the data in two dimensional arrays grouped by type, so when you want the whole data frame it will have to find the LCD of all the dtypes and construct a 2D array of that type.
DataFrame 将数据保存在按类型分组的二维数组中,因此当您想要整个数据框时,它必须找到所有 dtype 的 LCD 并构建该类型的二维数组。
To instantiate a new data frame with the values from the old one, just pass the old DataFrame to the new ones constructor and no data will be copied the same data structures will be reused:
要使用旧数据帧的值实例化新数据帧,只需将旧数据帧传递给新的构造函数,并且不会复制任何数据,将重用相同的数据结构:
>>> df1 = pd.DataFrame([[1, 2], [3, 4]])
>>> df2 = pd.DataFrame(df1)
>>> df2.iloc[0,0] = 42
>>> df1
0 1
0 42 2
1 3 4
But you can explicitly specify the copy
parameter:
但是您可以明确指定copy
参数:
>>> df1 = pd.DataFrame([[1, 2], [3, 4]])
>>> df2 = pd.DataFrame(df1, copy=True)
>>> df2.iloc[0,0] = 42
>>> df1
0 1
0 1 2
1 3 4
回答by Sudipta Basak
I have a function that may help some. I combine csv files with a header in the following way in python:
我有一个功能可以帮助一些人。我在 python 中以下列方式将 csv 文件与标题结合起来:
def combine_csvs(filedict, combined_file):
files = filedict['files']
df = pd.read_csv(files[0])
for file in files[1:]:
df = pd.concat([df, pd.read_csv(file)])
df.to_csv(combined_file, index=False)
return df
It can take as many files as you need. Call this as:
它可以根据需要获取任意数量的文件。称之为:
combine_csvs(dict(files=["file1.csv","file2.csv", "file3.csv"]), 'output.csv')
Or if you are reading the dataframe in python as:
或者,如果您正在将 python 中的数据帧读取为:
df = combine_csvs(dict(files=["file1.csv","file2.csv"]), 'output.csv')
The combine_csvs fucntion does not save the indices. If you need the indices use 'index=True' instead.
combine_csvs 功能不保存索引。如果您需要索引,请改用 'index=True'。
回答by naught101
d.index = range(len(d))
does a simple in-place index reset - i.e. it removes all of the existing indices, and adds a basic integer one, which is the most basic index type a pandas Dataframe can have.
执行简单的就地索引重置 - 即它删除所有现有索引,并添加一个基本整数,这是 Pandas Dataframe 可以拥有的最基本的索引类型。
回答by Matt
I was having a similar issue trying to take a DataFrame from an index-less CSV and write it back to another file.
我在尝试从无索引的 CSV 中获取 DataFrame 并将其写回另一个文件时遇到了类似的问题。
I came up with the following:
我想出了以下内容:
import pandas as pd
import os
def csv_to_df(csv_filepath):
# the read_table method allows you to set an index_col to False, from_csv does not
dataframe_conversion = pd.io.parsers.read_table(csv_filepath, sep='\t', header=0, index_col=False)
return dataframe_conversion
def df_to_excel(df):
from pandas import ExcelWriter
# Get the path and filename w/out extension
file_name = 'foo.xlsx'
# Add the above w/ .xslx
file_path = os.path.join('some/directory/', file_name)
# Write the file out
writer = ExcelWriter(file_path)
# index_label + index are set to `False` so that all the data starts on row
# index 1 and column labels (called headers by pandas) are all on row index 0.
df.to_excel(writer, 'Attributions Detail', index_label=False, index=False, header=True)
writer.save()
回答by Jason Sprong
Additionally, if you are using the df.to_excel
function of a pd.ExcelWriter
, which is where it is written to an Excel worksheet, you can specify index=False
in your parameters there.
此外,如果您正在使用 a 的df.to_excel
函数pd.ExcelWriter
,这是将它写入 Excel 工作表的位置,您可以index=False
在那里指定参数。
create the Excel writer:
创建 Excel 编写器:
writer = pd.ExcelWriter(type_box + '-rules_output-' + date_string + '.xlsx',engine='xlsxwriter')
We have a list called lines
:
我们有一个名为 的列表lines
:
# create a dataframe called 'df'
df = pd.DataFrame([sub.split(",") for sub in lines], columns=["Rule", "Device", "Status"]))
#convert df to Excel worksheet
df.to_excel(writer, sheet_name='all_status',**index=False**)
writer.save()