pandas 用于多个分隔符的熊猫 read_csv()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48063620/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:00:27  来源:igfitidea点击:

pandas read_csv() for multiple delimiters

pandas

提问by user77005

I have a file which has data as follows

我有一个文件,其中包含如下数据

1000000 183:0.6673;2:0.3535;359:0.304;363:0.1835
1000001 92:1.0
1000002 112:1.0
1000003 154435:0.746;30:0.3902;220:0.2803;238:0.2781;232:0.2717
1000004 118:1.0
1000005 157:0.484;25:0.4383;198:0.3033
1000006 277:0.7815;1980:0.4825;146:0.175
1000007 4069:0.6678;2557:0.6104;137:0.4261
1000009 2:1.0

I want to read the file to a pandas dataframe seperated by the multiple delimeters \t, :, ;

我想将文件读取到由多个分隔符分隔的 Pandas 数据帧 \t, :, ;

I tried

我试过

df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt", sep='\t|:|;', header=None, engine='python')

df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt", sep='\t|:|;', header=None, engine='python')

It gives me the following error.

它给了我以下错误。

pandas.errors.ParserError: Error could be due to quotes being ignored when a multi-char delimiter is used.

pandas.errors.ParserError: Error could be due to quotes being ignored when a multi-char delimiter is used.

Why am I getting this error?

为什么我收到这个错误?

So I thought I'll try to use the regex string. But I am not sure how to write a split regex. r'\t|:|;' doesn't work.

所以我想我会尝试使用正则表达式字符串。但我不确定如何编写拆分正则表达式。r'\t|:|;' 不起作用。

What is the best way to read a file to a pandas data frame with multiple delimiters?

将文件读取到具有多个分隔符的 Pandas 数据框的最佳方法是什么?

采纳答案by Tai

From this question, Handling Variable Number of Columns with Pandas - Python, one workaround to pandas.errors.ParserError: Expected 29 fields in line 11, saw 45.is let read_csvknow about how many rows in advance.

从这个问题,用 Pandas 处理可变数量的列 - Python,一种解决方法 pandas.errors.ParserError: Expected 29 fields in line 11, saw 45.read_csv提前知道多少行。

my_cols = [str(i) for i in range(45)] # create some row names
df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt",
                                   sep="\s+|;|:",
                                   names=my_cols, 
                                   header=None, 
                                   engine="python")
# I tested with s = StringIO(text_from_OP) on my computer

enter image description here

在此处输入图片说明

Hope this works.

希望这有效。