使用 Pandas 读取 CSV 文件：复杂分隔符

Question

提问by Mohammad Saifullah

I have a csv file which I want to read using python panda. The header and lines looks the following:

我有一个 csv 文件，我想使用 python panda 读取它。标题和行如下所示：

 A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N

Clearly it seen that, separator is ^, sometimes there are some odd spaces. How can I read this file perfectly?

很明显，分隔符是^，有时会有一些奇怪的空格。我怎样才能完美地阅读这个文件？

I am using the following command to read the csv file:

我正在使用以下命令读取 csv 文件：

df = pd.read_csv('input.csv', sep='^')

Answer 1

回答by EdChum

Use regex \s*\^which means 0 or more whitespace and ^, you have to specify the python engine here to avoid a warning about regex support:

使用正则表达式\s*\^，这意味着 0 个或多个空格和 ^，您必须在此处指定 python 引擎以避免有关正则表达式支持的警告：

In [152]:

t="""A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N"""
df= pd.read_csv(io.StringIO(t), sep='\s*\^', engine='python')
df.columns
Out[152]:
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N'], dtype='object')

Answer 2

回答by Malik Brahimi

Can't you supply regex as a seperator?

你不能提供正则表达式作为分隔符吗？

sep = re.compile(r'[\^\s]+')

Answer 3

回答by Zachary Cross

Your separator can be a regular expression, so try something like this:

您的分隔符可以是正则表达式，因此请尝试以下操作：

df = pd.read_csv('input.csv', sep="[ ^]+")

The regular expression should use any number of spaces or carets (^) in a row as a single separator.

正则表达式应在一行中使用任意数量的空格或插入符号 (^) 作为单个分隔符。

Answer 4

回答by Alexander

Read the file as you have done and then strip extra whitespace for each column which is a string:

像您一样读取文件，然后为作为字符串的每一列去除额外的空格：

df = (pd.read_csv('input.csv', sep="^")
      .apply(lambda x: x.str.strip() if isinstance(x, str) else x))

Answer 5

回答by user2030378

If the only whitespace in your file is the extra whitespace between columns (i.e. no columns have raw text with spaces), an easy fix would be to simply remove all the spaces in the file. An example command to do that would be:

如果文件中唯一的空白是列之间的额外空白（即没有列具有带空格的原始文本），一个简单的解决方法是简单地删除文件中的所有空格。执行此操作的示例命令是：

<input.csv tr -d '[[:blank:]]' > new_input.txt

使用 Pandas 读取 CSV 文件：复杂分隔符

提问by Mohammad Saifullah

回答by EdChum

回答by Malik Brahimi

回答by Zachary Cross

回答by Alexander

回答by user2030378

相关推荐

最近更新

标签

使用 Pandas 读取 CSV 文件：复杂分隔符

提问by Mohammad Saifullah

回答by EdChum

回答by Malik Brahimi

回答by Zachary Cross

回答by Alexander

回答by user2030378

相关推荐

如何使用子图创建 Pandas groupby 图？

pandas 使用分隔符pandas python将单元格连接成一个字符串

pandas 根据另一列的值从一列复制值

pandas 非常嘈杂的信号的 Scipy FFT 频率分析

相关推荐

最近更新

标签