pandas 将文本文件转换为熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32120949/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:47:45  来源:igfitidea点击:

Converting text files to pandas dataframe

pythonpandasreadlines

提问by Joey

I have .TX0 file (some sort of csv txt file) and have converted this to a .txt file via python .readlines(), open(filename, 'w') etc method. I have this new saved txt file but when i try to convert it to a dataframe it's giving me only one column. the txt file is below :

我有 .TX0 文件(某种 csv txt 文件)并通过 python .readlines()、open(filename, 'w') 等方法将其转换为 .txt 文件。我有这个新保存的 txt 文件,但是当我尝试将其转换为数据框时,它只给了我一列。txt文件如下:

Empty DataFrame
Columns: [ '"Software Version:", 6.3.2.0646, Date:, 19/08/2015 09:26:04\n',  '"Reprocess Number:", vma2:  261519, Unnamed: 7, \n',  '"Sample Name:",  , Data Acquisition Time:, 18/08/2015 17:23:23\n',  '"Instrument Name:", natural gas (PE ASXL-TCD/FID), Channel:, B\n',  '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n',  '"Sample Amount:", 1.000000, Dilution Factor:, 1.000000\n',  '"Cycle:", 1, Result File :, \\vma2\TotalChrom170_he_tcd001.rst \n',  '"Sequence File :", \\vma\C1_C2_binary.seq \n',  '"===================================================================================================================================="\n',  '""\n',  '""\n'.1,  '"condensate analysis (HP4890 Optic - FID)"\n',  '"Peak", Component, Time, Area, Height, BL\n',  '"#", Name, [min], [uV*sec], [uV], \n'.1,  '------, ------, ------.1, ------.2, ------.3, ------\n',  '1, Unnamed: 55, 0.810, 706.42, 304.38, *BB\n',  '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n'.1,  '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2,  '"", Unnamed: 73, Unnamed: 74, ------.4, ------.5, \n'.2,  '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3,  '"Missing Component Report"\n',  '"Component", Expected Retention (Calibration File)\n',  '------.1, ------\n'.1,  '"All components were found"\n',  '"Report stored in ASCII file :", C:\Shared Folders\TotalChrom\11170_he_tcd001.TX0 \n']]
Index: []

for easier reading:

为了更容易阅读:

Empty DataFrame

Columns: [ '"Software Version:", 6.3.2.0646, Date:, 19/08/2015 09:26:04\n', '"Reprocess Number:", vma2: 261519, Unnamed: 7, \n', '"Sample Name:", , Data Acquisition Time:, 18/08/2015 17:23:23\n', '"Instrument Name:", natural gas (PE ASXL-TCD/FID), Channel:, B\n', '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n', '"Sample Amount:", 1.000000, Dilution Factor:, 1.000000\n', '"Cycle:", 1, Result File :, \\vma2\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.rst \n', '"Sequence File :", \\vma2\TotalChrom\sequences\Joey\C1_C2_binary.seq \n', '"===================================================================================================================================="\n', '""\n', '""\n'.1, '"condensate analysis (HP4890 Optic - FID)"\n', '"Peak", Component, Time, Area, Height, BL\n', '"#", Name, [min], [uV*sec], [uV], \n'.1, '------, ------, ------.1, ------.2, ------.3, ------\n', '1, Unnamed: 55, 0.810, 706.42, 304.38, *BB\n', '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n'.1, '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2, '"", Unnamed: 73, Unnamed: 74, ------.4, ------.5, \n'.2, '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3, '"Missing Component Report"\n', '"Component", Expected Retention (Calibration File)\n', '------.1, ------\n'.1, '"All components were found"\n', '"Report stored in ASCII file :", C:\Shared Folders\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.TX0 \n']] Index: []

空数据帧

列:['"软件版本:",6.3.2.0646,日期:,19/08/2015 09:26:04\n','"重新处理编号:",vma2:261519,未命名:7,\n', '"样品名称:", , 数据采集时间:, 18/08/2015 17:23:23\n', '"仪器名称:", 天然气 (PE ASXL-TCD/FID), Channel:, B\ n', '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n', '"Sample Amount:", 1.000000, 稀释因子:, 1.000000\n', '"Cycle:", 1 , 结果文件 :, \\vma2\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.rst \n', '"序列文件:", \\vma2\TotalChrom\sequences\Joey\C1_C2_binary,.seq\ '"================================================== ================================================== ================================"\n', '""\n', '""\n' .1, '"冷凝分析 (HP4890 Optic - FID)"\n', '"Peak", Component, Time, Area, Height, BL\n', '"#", Name, [min], [uV*秒], [uV], \n'.1, '------, ------, ------.1, ------.2, ---- --.3, ------\n', '1, 未命名: 55, 0.810, 706.42, 304.38, *BB\n', '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n' .1, '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2, '"", 未命名: 73, 未命名: 74, ------.4, ------. 5, \n'.2, '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3, '"缺少组件报告"\n', '"组件", 预期保留 (校准文件)\n', '------.1, ------\n'.1, '"找到所有组件"\n', '"报告存储在 ASCII 文件 :", C:\Shared Folders\TotalChrom\data\Joey\Binary_Mixtures\Std1\11170_he_tcd001.TX0 \n']] 索引:[]

As you can see this is comma separated. Would there be any way of transferring this text to a comma delimited dataframe?

如您所见,这是逗号分隔的。有没有办法将此文本传输到逗号分隔的数据框?

Thanks.

谢谢。

J

J

回答by pdm

You can try to use the below function and it will helps you load all the data from your local csv file

您可以尝试使用以下功能,它将帮助您从本地 csv 文件中加载所有数据

ps.read_csv()

More details can be found in pandas.read_csv tutorial

更多细节可以在pandas.read_csv教程中找到

回答by Rishi Bansal

you can try below code to convert text file into dataframe.

您可以尝试以下代码将文本文件转换为数据框。

data = pd.read_csv('file.txt', sep=',')

Hope its self explanatory.

希望它的自我解释。

回答by beginner

Here I came with a general answer to this question:

在这里,我对这个问题给出了一个一般性的答案:

import re
import pandas as pd

#first u have to open  the file and seperate every line like below:

df = open('file.txt', "r")
lines = df.readlines()
df.close()

# remove /n at the end of each line
for index, line in enumerate(lines):
      lines[index] = line.strip()



#creating a dataframe(consider u want to convert your data to 2 columns)

df_result = pd.DataFrame(columns=('first_col', 'second_col'))
i = 0  
first_col = "" 
second_col = ""  
for line in lines:
    #you can use "if" and "replace" in case you had some conditions to manipulate the txt data
    if 'X' in line:
        first_col = line.replace('X', "")
    else:
        #you have to kind of define what are the values in columns,for example second column includes:
        second_col = re.sub(r' \(.*', "", line)
        #this is how you create next line data
        df_result.loc[i] = [first_col, second_col]
        i =i+1