Python 如何打开 .data 文件扩展名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31797013/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:33:22  来源:igfitidea点击:

How to open a .data file extension

pythonmacosfile-extension

提问by Jason Donnald

I am working on side stuff where the data provided is in a .datafile. How do I open a .datafile to see what the data looks like and also how do I read from a .datafile programmatically through python? I have Mac OSX

我正在处理提供的数据在.data文件中的辅助内容。如何打开.data文件以查看数据的外观以及如何.data通过 python以编程方式读取文件?我有 Mac OSX

NOTE:The Data I am working with is for one of the KDD cup challenges

注意:我正在使用的数据是针对其中一个KDD cup challenges

回答by user2539336

It vastly depends on what is in it. It could be a binary file or it could be a text file.

这在很大程度上取决于其中的内容。它可以是二进制文件,也可以是文本文件。

If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))

如果它是一个文本文件,那么您可以像打开任何文件一样打开它 (f=open(filename,"r"))

If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:

如果它是一个二进制文件,你可以在打开命令(open(filename,"rb"))中添加一个“b”。这里有一个例子:

Reading binary file in Python and looping over each byte

在 Python 中读取二进制文件并遍历每个字节

Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)

根据那里的数据类型,您可能想尝试通过 csv 阅读器(csv python 模块)或 xml 解析库(其中一个例子是 lxml)传递它

After further into from above and looking at the page the format is:

从上面进一步进入并查看页面后,格式为:

Data Format The datasets use a format similar as that of the text export format from relational databases:

数据格式数据集使用的格式类似于关系数据库中的文本导出格式:

One header lines with the variables names One line per instance Separator tabulation between the values There are missing values (consecutive tabulations)

带有变量名称的标题行 每个实例一行 值之间的分隔符列表 缺少值(连续列表)

Therefore see this answer:

因此,请参阅此答案:

parsing a tab-separated file in Python

在 Python 中解析制表符分隔的文件

I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...

我建议尝试一次处理一行而不是加载整个文件,但如果你有内存为什么不......

I suspect it doesnt open in sublime because the file is huge, but that is just a guess.

我怀疑它不会在 sublime 中打开,因为文件很大,但这只是一个猜测。

回答by nbari

To get a quick overview of what the file may content you could do this within a terminal, using stringsor cat, for example:

要快速了解文件可能包含的内容,您可以在终端中执行此操作,例如使用stringscat

$ strings file.data

or

或者

$ cat -v file.data

In case you forget to pass the -voption to cat and if is a binary file you could mess your terminal and therefore need to reset it:

如果您忘记将-v选项传递给 cat 并且如果是二进制文件,您可能会弄乱您的终端,因此需要重置它:

$ reset