Python 将 .data 文件转换为 .csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30762762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
convert .data file to .csv
提问by Little
I have found the following data set named ecoli.data and available in:
我发现以下名为 ecoli.data 的数据集可用于:
https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/
https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/
I would like to open it in R for making a classification task, but I would prefer to convert this document into a csv file. When I open it in word I notice that is not tab delimited, because there are like tree spaces between each row; so bottomline question is how to convert this file into csv using Excel or maybe Python.
我想在 R 中打开它以进行分类任务,但我更愿意将此文档转换为 csv 文件。当我在 word 中打开它时,我注意到它不是制表符分隔的,因为每行之间都有树状空间;所以底线问题是如何使用 Excel 或 Python 将此文件转换为 csv。
回答by cars10m
Rename the file to ecoli.txt
then open it in Excel. This way you will be using the "Text Import Wizard" of Microsoft Excel that enables you to chose options like "Fixed width". Just click on "next" a few times and "finish" and you will have the data in the Excel grid. Now save it again as CSV.
重命名文件,ecoli.txt
然后在 Excel 中打开它。这样,您将使用 Microsoft Excel 的“文本导入向导”,它使您能够选择“固定宽度”等选项。只需单击“下一步”几次并“完成”,您就会在 Excel 网格中获得数据。现在再次将其另存为 CSV。
回答by Sait
Using Python 2.7:
使用 Python 2.7:
import csv
with open('ecoli.data.txt') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip().split()
newLines.append( newLine )
with open('output.csv', 'wb') as test_file:
file_writer = csv.writer(test_file)
file_writer.writerows( newLines )
回答by hrbrmstr
Here are two ways to actually do that in R (that work):
以下是在 R 中实际执行此操作的两种方法(该工作):
library(readr)
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/ecoli.data"
with base R
带基 R
df <- read.table(url)
dplyr::glimpse(df)
## Observations: 336
## Variables:
## $ V1 (fctr) AAT_ECOLI, ACEA_ECOLI, ACEK_ECOLI, ACKA_ECOLI, ADI_ECOLI, ...
## $ V2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ V3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ V4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ V5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ V6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ V7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ V8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ V9 (fctr) cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp...
write.csv(df, "ecoli.csv", row.names=FALSE)
with readr
functions
带readr
功能
df <- read_table(url, col_names=FALSE)
dplyr::glimpse(df)
## Observations: 336
## Variables:
## $ X1 (chr) "AAT_ECOLI", "ACEA_ECOLI", "ACEK_ECOLI", "ACKA_ECOLI", "ADI...
## $ X2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ X3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ X4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ X5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ X6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ X7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ X8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ X9 (chr) "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp",...
write_csv(df, "ecoli.csv")
回答by Anurag Sharma
Use
pandas.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/', delim_whitespace=True)
用
pandas.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/ecoli/', delim_whitespace=True)
回答by Rathinavel Subramanian
It's very simple, click the actual dataset name ex: xyz.data and rename it with XYZ.csv this will be converted into CSV format.
很简单,点击实际的数据集名称例如:xyz.data 并将其重命名为 XYZ.csv 这将转换为 CSV 格式。
回答by T-rex
An alternative to solve your problem could be to read your .data
file on R using the read.table
command.
解决您的问题的另一种方法是.data
使用该read.table
命令在 R 上读取您的文件。
ecoli <- read.table("ecoli.data",header=F)