如何使用 Pandas 在 Python 中读取文本文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42640571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read text file in Python using Pandas
提问by Bhanu Chander
I'm new to Pandas and I've been trying to do a scatter plot in Python 2.7, I've the dataset in .txt file something like this (comma separated)
我是 Pandas 的新手,我一直在尝试在 Python 2.7 中绘制散点图,我在 .txt 文件中的数据集是这样的(逗号分隔)
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
import pandas as pd
import matplotlib.pyplot as mplt
# Taking Dataset using Pandas
input_data = pd.read_csv('data.txt');
#input_data.head(5)
How to plot the above data in scatter plot without any headers on the dataset ?
如何在散点图中绘制上述数据,而数据集上没有任何标题?
I've seen in tutorials and examples that if the data set has column headings then it's possible to plot the scatter plot. I tried putting x and y as the headers for the two columns of the data set in .txt file and tried the below code.
我在教程和示例中看到,如果数据集具有列标题,则可以绘制散点图。我尝试将 x 和 y 作为 .txt 文件中两列数据集的标题,并尝试了以下代码。
input_data = pd.read_csv('data.txt');
#input_data.head(5)
x_value = input_data[['x']]
y_value = input_data[['y']]
mplt.scatter(x_value, y_value)
But still I'm getting error as shown below
但我仍然收到如下所示的错误
Traceback (most recent call last):
File "E:\IIT Madras\Research\Experiments\Machine Learning\Linear Regression\Linear_Regression.py", line 16, in <module>
y_value = input_data[['y']]
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__
return self._getitem_array(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1835, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1112, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['y'] not in index"
Is there a better way to deal with this (with and without header names) ?
有没有更好的方法来处理这个(有和没有标题名称)?
EDIT:
编辑:
The following worked for me after going through Ishan reply
经过伊山回复后,以下内容对我有用
input_data = pd.read_csv('data.txt', header =None);
x_value = input_data[[0]]
y_value = input_data[[1]]
mplt.scatter(x_value, y_value)
mplt.show()
回答by Ishan
Try importing the data without column headers and then naming columns by your own :
尝试导入没有列标题的数据,然后按您自己的命名列:
df=pd.read_csv(r'/home/ishan/Desktop/file',header=None)
df.columns=['x','y']
import matplotlib.pyplot as plt
plt.scatter(df['x'],df['y'])
plt.show()