pandas 如何将以逗号分隔的制表符更改为熊猫中的逗号分隔符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33524199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:10:17  来源:igfitidea点击:

How to change tab delimited in to comma delimited in pandas

pythonpandas

提问by Same

I don't know if this is something possible. I am trying to append 12 files into a single file. One of the files is tab delimited and the rest comma delimitted. I loaded all the 12 files into dataframe and append it into an empty dataframe one by one in a loop.

我不知道这是否可能。我正在尝试将 12 个文件附加到一个文件中。其中一个文件以制表符分隔,其余文件以逗号分隔。我将所有 12 个文件加载到数据帧中,并在循环中将其逐个附加到空数据帧中。

list_of_files = glob.glob('./*.txt')
df = pd.DataFrame()
for filename in list_of_files:
    file = pd.read_csv(filename)
    dfFilename = pd.DataFrame(file)
    df = df.append(dfFilename, ignore_index=True)

But the big file is not in the format I wanted it to be. And I think the problem is with the tab delimited file. And I tried to run the code without the tab delimited file and the format of the appended file is fine. So I was thinking if it is possible to change the tab delimited format into comma delimited using pandas.

但是大文件不是我想要的格式。我认为问题在于制表符分隔的文件。我尝试在没有制表符分隔文件的情况下运行代码,并且附加文件的格式很好。所以我在想是否可以将制表符分隔格式更改为使用Pandas分隔的逗号。

Thank you for your help and suggestion

感谢您的帮助和建议

回答by AustinC

You need to tell Pandas that the file is tab delimited when you import it. You can pass a delimiter to the read_csv method but in your case, since the delimiter changes by file, you want to pass None - this will make Pandas auto-detect the correct delimiter.

您需要在导入时告诉 Pandas 该文件是制表符分隔的。您可以将分隔符传递给 read_csv 方法,但在您的情况下,由于分隔符按文件更改,您希望传递 None - 这将使 Pandas 自动检测正确的分隔符。

Change your read_csv line to:

将您的 read_csv 行更改为:

pd.read_csv(filename,sep=None)

回答by asiviero

For the file that is tab-separated, you should use:

对于制表符分隔的文件,您应该使用:

file = pd.read_csv(filename, sep="\t")

Pandas read_csvhas quite a lot of parameters, check it out in the docs

Pandasread_csv有很多参数,在文档中查看