pandas 将文本文件转换为熊猫数据框

Question

提问by Joey

I have .TX0 file (some sort of csv txt file) and have converted this to a .txt file via python .readlines(), open(filename, 'w') etc method. I have this new saved txt file but when i try to convert it to a dataframe it's giving me only one column. the txt file is below :

我有 .TX0 文件（某种 csv txt 文件）并通过 python .readlines()、open(filename, 'w') 等方法将其转换为 .txt 文件。我有这个新保存的 txt 文件，但是当我尝试将其转换为数据框时，它只给了我一列。txt文件如下：

Empty DataFrame
Columns: [ '"Software Version:", 6.3.2.0646, Date:, 19/08/2015 09:26:04\n',  '"Reprocess Number:", vma2:  261519, Unnamed: 7, \n',  '"Sample Name:",  , Data Acquisition Time:, 18/08/2015 17:23:23\n',  '"Instrument Name:", natural gas (PE ASXL-TCD/FID), Channel:, B\n',  '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n',  '"Sample Amount:", 1.000000, Dilution Factor:, 1.000000\n',  '"Cycle:", 1, Result File :, \\vma2\TotalChrom170_he_tcd001.rst \n',  '"Sequence File :", \\vma\C1_C2_binary.seq \n',  '"===================================================================================================================================="\n',  '""\n',  '""\n'.1,  '"condensate analysis (HP4890 Optic - FID)"\n',  '"Peak", Component, Time, Area, Height, BL\n',  '"#", Name, [min], [uV*sec], [uV], \n'.1,  '------, ------, ------.1, ------.2, ------.3, ------\n',  '1, Unnamed: 55, 0.810, 706.42, 304.38, *BB\n',  '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n'.1,  '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2,  '"", Unnamed: 73, Unnamed: 74, ------.4, ------.5, \n'.2,  '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3,  '"Missing Component Report"\n',  '"Component", Expected Retention (Calibration File)\n',  '------.1, ------\n'.1,  '"All components were found"\n',  '"Report stored in ASCII file :", C:\Shared Folders\TotalChrom\11170_he_tcd001.TX0 \n']]
Index: []

for easier reading:

为了更容易阅读：

Empty DataFrame
Columns: [ '"Software Version:", 6.3.2.0646, Date:, 19/08/2015 09:26:04\n', '"Reprocess Number:", vma2: 261519, Unnamed: 7, \n', '"Sample Name:", , Data Acquisition Time:, 18/08/2015 17:23:23\n', '"Instrument Name:", natural gas (PE ASXL-TCD/FID), Channel:, B\n', '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n', '"Sample Amount:", 1.000000, Dilution Factor:, 1.000000\n', '"Cycle:", 1, Result File :, \\vma2\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.rst \n', '"Sequence File :", \\vma2\TotalChrom\sequences\Joey\C1_C2_binary.seq \n', '"===================================================================================================================================="\n', '""\n', '""\n'.1, '"condensate analysis (HP4890 Optic - FID)"\n', '"Peak", Component, Time, Area, Height, BL\n', '"#", Name, [min], [uV*sec], [uV], \n'.1, '------, ------, ------.1, ------.2, ------.3, ------\n', '1, Unnamed: 55, 0.810, 706.42, 304.38, *BB\n', '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n'.1, '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2, '"", Unnamed: 73, Unnamed: 74, ------.4, ------.5, \n'.2, '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3, '"Missing Component Report"\n', '"Component", Expected Retention (Calibration File)\n', '------.1, ------\n'.1, '"All components were found"\n', '"Report stored in ASCII file :", C:\Shared Folders\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.TX0 \n']] Index: []

空数据帧
列：['"软件版本："，6.3.2.0646，日期：，19/08/2015 09:26:04\n'，'"重新处理编号："，vma2：261519，未命名：7，\n'， '"样品名称:", , 数据采集时间:, 18/08/2015 17:23:23\n', '"仪器名称:", 天然气 (PE ASXL-TCD/FID), Channel:, B\ n', '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n', '"Sample Amount:", 1.000000, 稀释因子:, 1.000000\n', '"Cycle:", 1 , 结果文件 :, \\vma2\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.rst \n', '"序列文件:", \\vma2\TotalChrom\sequences\Joey\C1_C2_binary,.seq\ '"================================================== ================================================== ================================"\n', '""\n', '""\n' .1, '"冷凝分析 (HP4890 Optic - FID)"\n', '"Peak", Component, Time, Area, Height, BL\n', '"#", Name, [min], [uV*秒], [uV], \n'.1, '------, ------, ------.1, ------.2, ---- --.3, ------\n', '1, 未命名: 55, 0.810, 706.42, 304.38, *BB\n', '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n' .1, '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2, '"", 未命名: 73, 未命名: 74, ------.4, ------. 5, \n'.2, '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3, '"缺少组件报告"\n', '"组件", 预期保留 (校准文件)\n', '------.1, ------\n'.1, '"找到所有组件"\n', '"报告存储在 ASCII 文件 :", C:\Shared Folders\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.TX0 \n']] 索引：[]

As you can see this is comma separated. Would there be any way of transferring this text to a comma delimited dataframe?

如您所见，这是逗号分隔的。有没有办法将此文本传输到逗号分隔的数据框？

Thanks.

谢谢。

J

Answer 1

回答by pdm

You can try to use the below function and it will helps you load all the data from your local csv file

您可以尝试使用以下功能，它将帮助您从本地 csv 文件中加载所有数据

ps.read_csv()

More details can be found in pandas.read_csv tutorial

更多细节可以在pandas.read_csv教程中找到

Answer 2

回答by Rishi Bansal

you can try below code to convert text file into dataframe.

您可以尝试以下代码将文本文件转换为数据框。

data = pd.read_csv('file.txt', sep=',')

Hope its self explanatory.

希望它的自我解释。

Answer 3

回答by beginner

Here I came with a general answer to this question:

在这里，我对这个问题给出了一个一般性的答案：

import re
import pandas as pd

#first u have to open  the file and seperate every line like below:

df = open('file.txt', "r")
lines = df.readlines()
df.close()

# remove /n at the end of each line
for index, line in enumerate(lines):
      lines[index] = line.strip()



#creating a dataframe(consider u want to convert your data to 2 columns)

df_result = pd.DataFrame(columns=('first_col', 'second_col'))
i = 0  
first_col = "" 
second_col = ""  
for line in lines:
    #you can use "if" and "replace" in case you had some conditions to manipulate the txt data
    if 'X' in line:
        first_col = line.replace('X', "")
    else:
        #you have to kind of define what are the values in columns,for example second column includes:
        second_col = re.sub(r' \(.*', "", line)
        #this is how you create next line data
        df_result.loc[i] = [first_col, second_col]
        i =i+1

pandas 将文本文件转换为熊猫数据框

提问by Joey

回答by pdm

回答by Rishi Bansal

回答by beginner

相关推荐

最近更新

标签

pandas 将文本文件转换为熊猫数据框

提问by Joey

回答by pdm

回答by Rishi Bansal

回答by beginner

相关推荐

按另一个索引的顺序对 Pandas Dataframe 进行排序

Pandas：循环遍历列

Pandas 使用“更大”的 DataFrames 附加性能连接/附加

Python 和 Pandas：将列组合成一个日期

相关推荐

最近更新

标签