Python 将 CSV 文件作为 Pandas DataFrame 导入

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14365542/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:12:46  来源:igfitidea点击:

Import CSV file as a pandas DataFrame

pythonpandascsvdataframe

提问by mazlor

What's the Python way to read in a CSV file into a pandas DataFrame(which I can then use for statistical operations, can have differently-typed columns, etc.)?

将 CSV 文件读入Pandas DataFrame(然后我可以将其用于统计操作,可以具有不同类型的列等)的 Python 方式是什么?

My CSV file "value.txt"has the following content:

我的 CSV 文件"value.txt"包含以下内容:

Date,"price","factor_1","factor_2"
2012-06-11,1600.20,1.255,1.548
2012-06-12,1610.02,1.258,1.554
2012-06-13,1618.07,1.249,1.552
2012-06-14,1624.40,1.253,1.556
2012-06-15,1626.15,1.258,1.552
2012-06-16,1626.15,1.263,1.558
2012-06-17,1626.15,1.264,1.572

In R we would read this file in using:

在 R 中,我们将使用以下方法读取此文件:

price <- read.csv("value.txt")  

and that would return an R data.frame:

这将返回一个 R data.frame:

> price <- read.csv("value.txt")
> price
     Date   price factor_1 factor_2
1  2012-06-11 1600.20    1.255    1.548
2  2012-06-12 1610.02    1.258    1.554
3  2012-06-13 1618.07    1.249    1.552
4  2012-06-14 1624.40    1.253    1.556
5  2012-06-15 1626.15    1.258    1.552
6  2012-06-16 1626.15    1.263    1.558
7  2012-06-17 1626.15    1.264    1.572

Is there a Pythonic way to get the same functionality?

有没有一种 Pythonic 的方式来获得相同的功能?

采纳答案by root

pandasto the rescue:

大熊猫来救援:

import pandas as pd
print pd.read_csv('value.txt')

        Date    price  factor_1  factor_2
0  2012-06-11  1600.20     1.255     1.548
1  2012-06-12  1610.02     1.258     1.554
2  2012-06-13  1618.07     1.249     1.552
3  2012-06-14  1624.40     1.253     1.556
4  2012-06-15  1626.15     1.258     1.552
5  2012-06-16  1626.15     1.263     1.558
6  2012-06-17  1626.15     1.264     1.572

This returns pandas DataFramethat is similar to R's.

这将返回大熊猫据帧类似于R's

回答by KurzedMetal

You can use the csv modulefound in the python standard library to manipulate CSV files.

您可以使用python 标准库中的csv 模块来操作 CSV 文件。

example:

例子:

import csv
with open('some.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row

回答by Lee-Man

Note quite as clean, but:

注意同样干净,但是:

import csv

with open("value.txt", "r") as f:
    csv_reader = reader(f)
    num = '  '
    for row in csv_reader:
        print num, '\t'.join(row)
        if num == '  ':  
            num=0
        num=num+1

Not as compact, but it does the job:

不那么紧凑,但它可以完成工作:

   Date price   factor_1    factor_2
1 2012-06-11    1600.20 1.255   1.548
2 2012-06-12    1610.02 1.258   1.554
3 2012-06-13    1618.07 1.249   1.552
4 2012-06-14    1624.40 1.253   1.556
5 2012-06-15    1626.15 1.258   1.552
6 2012-06-16    1626.15 1.263   1.558
7 2012-06-17    1626.15 1.264   1.572

回答by sidi

Here's an alternative to pandas library using Python's built-in csv module.

这是使用 Python 的内置csv 模块的pandas 库的替代方案。

import csv
from pprint import pprint
with open('foo.csv', 'rb') as f:
    reader = csv.reader(f)
    headers = reader.next()
    column = {h:[] for h in headers}
    for row in reader:
        for h, v in zip(headers, row):
            column[h].append(v)
    pprint(column)    # Pretty printer

will print

将打印

{'Date': ['2012-06-11',
          '2012-06-12',
          '2012-06-13',
          '2012-06-14',
          '2012-06-15',
          '2012-06-16',
          '2012-06-17'],
 'factor_1': ['1.255', '1.258', '1.249', '1.253', '1.258', '1.263', '1.264'],
 'factor_2': ['1.548', '1.554', '1.552', '1.556', '1.552', '1.558', '1.572'],
 'price': ['1600.20',
           '1610.02',
           '1618.07',
           '1624.40',
           '1626.15',
           '1626.15',
           '1626.15']}

回答by cs95

To read a CSV file as a pandas DataFrame, you'll need to use pd.read_csv.

要将 CSV 文件作为 Pandas DataFrame 读取,您需要使用pd.read_csv.

But this isn't where the story ends; data exists in many different formats and is stored in different ways so you will often need to pass additional parameters to read_csvto ensure your data is read in properly.

但这不是故事的结局;数据以多种不同的格式存在并以不同的方式存储,因此您通常需要传递额外的参数read_csv以确保您的数据被正确读取。

Here's a table listing common scenarios encountered with CSV files along with the appropriate argument you will need to use. You will usually need all or some combination of the arguments below to read in yourdata.

下面的表格列出了 CSV 文件遇到的常见场景以及您需要使用的适当参数。您通常需要以下参数的全部或某些组合来读入您的数据。

┌──────────────────────────────────────────────────────────┬─────────────────────────────┬────────────────────────────────────────────────────────┐
│  ScenarioArgumentExample                                               │
├──────────────────────────────────────────────────────────┼─────────────────────────────┼────────────────────────────────────────────────────────┤
│  Read CSV with different separator1                      │  sep/delimiter              │  read_csv(..., sep=';')                                │
│  Read CSV with tab/whitespace separator                  │  delim_whitespace           │  read_csv(..., delim_whitespace=True)                  │
│  Fix UnicodeDecodeError while reading2                   │  encoding                   │  read_csv(..., encoding='latin-1')                     │
│  Read CSV without headers3                               │  header and names           │  read_csv(..., header=False, names=['x', 'y', 'z'])    │
│  Specify which column to set as the index?               │  index_col                  │  read_csv(..., index_col=[0])                          │
│  Read subset of columns                                  │  usecols                    │  read_csv(..., usecols=['x', 'y'])                     │
│  Numeric data is in European format (eg., 1.234,56)      │  thousands and decimal      │  read_csv(..., thousands='.', decimal=',')             │
└──────────────────────────────────────────────────────────┴─────────────────────────────┴────────────────────────────────────────────────────────┘

Footnotes

  1. By default, read_csvuses a C parser engine for performance. The C parser can only handle single character separators. If your CSV has a multi-character separator, you will need to modify your code to use the 'python'engine. You can also pass regular expressions:

    df = pd.read_csv(..., sep=r'\s*\|\s*', engine='python')
    
  2. UnicodeDecodeErroroccurs when the data was stored in one encoding format but read in a different, incompatible one. Most common encoding schemes are 'utf-8'and 'latin-1', your data is likely to fit into one of these.

  3. header=Falsespecifies that the first row in the CSV is a data row rather than a header row, and the names=[...]allows you to specify a list of column names to assign to the DataFrame when it is created.

  4. "Unnamed: 0" occurs when a DataFrame with an un-named index is saved to CSV and then re-read after. Instead of having to fix the issue while reading, you can also fix the issue when writing by using

    df.to_csv(..., index=False)
    

脚注

  1. 默认情况下,read_csv使用 C 解析器引擎来提高性能。C 解析器只能处理单个字符分隔符。如果您的 CSV 具有多字符分隔符,则需要修改代码才能使用该'python'引擎。您还可以传递正则表达式:

    df = pd.read_csv(..., sep=r'\s*\|\s*', engine='python')
    
  2. UnicodeDecodeError当数据以一种编码格式存储但以另一种不兼容的编码格式读取时发生。最常见的编码方案是'utf-8''latin-1',您的数据可能适合其中之一。

  3. header=False指定 CSV 中的第一行是数据行而不是标题行,并且names=[...]允许您指定要在创建 DataFrame 时分配给它的列名称列表。

  4. 当具有未命名索引的 DataFrame 被保存到 CSV 然后重新读取时,会发生“未命名:0”。不必在阅读时解决问题,您还可以在写作时使用以下方法解决问题

    df.to_csv(..., index=False)
    

There are other arguments I've not mentioned here, but these are the ones you'll encounter most frequently.

还有其他一些我在这里没有提到的参数,但这些是您最常遇到的参数。

回答by Kamal

%cd C:\Users\asus\Desktop\python
import pandas as pd
df = pd.read_csv('value.txt')
df.head()
    Date    price   factor_1    factor_2
0   2012-06-11  1600.20 1.255   1.548
1   2012-06-12  1610.02 1.258   1.554
2   2012-06-13  1618.07 1.249   1.552
3   2012-06-14  1624.40 1.253   1.556
4   2012-06-15  1626.15 1.258   1.552

回答by Rishabh

import pandas as pd
df = pd.read_csv('/PathToFile.txt', sep = ',')

This will import your .txt or .csv file into a DataFrame.

这会将您的 .txt 或 .csv 文件导入到 DataFrame 中。

回答by Dulangi_Kanchana

Try this

尝试这个

import pandas as pd
data=pd.read_csv('C:/Users/Downloads/winequality-red.csv')

Replace the file target location, with where your data set is found, refer this url https://medium.com/@kanchanardj/jargon-in-python-used-in-data-science-to-laymans-language-part-one-12ddfd31592f

将文件目标位置替换为找到您的数据集的位置,请参考此网址 https://medium.com/@kanchanardj/jargon-in-python-used-in-data-science-to-laymans-language-part- one-12ddfd31592f