如何在没有 Pandas 的情况下过滤 CSV 文件?(Pythonista 中 Pandas 的最佳替代品)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40703099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:28:02  来源:igfitidea点击:

How to filter a CSV file without Pandas? (Best Substitute for Pandas in Pythonista)

python-3.xpandasdata-analysispythonista

提问by zeh

I am trying to do some data analysis on Pythonista 3 (iOS app for python), however because of the C libraries of pandas it does not compile in the iOS device.

我正在尝试对 Pythonista 3(适用于 Python 的 iOS 应用程序)进行一些数据分析,但是由于 Pandas 的 C 库,它无法在 iOS 设备中编译。

Is there any substitute for Pandas? Would numpy be an option for data of type string?

有什么可以代替Pandas的吗?numpy 会是类型数据的选项string吗?

The data set I have at the moment is the history of messages between my friends and I.

我目前拥有的数据集是我和朋友之间的消息历史记录。

The whole history is in one csv file. Each row has the columns 'day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'

整个历史记录在一个 csv 文件中。每行都有列“day_of_the_week”、“date”、“time_of_message”、“author_of_message”、“message_body”

The goal of the analysis is to produce a report of our chat for the past year.

分析的目标是生成我们过去一年的聊天报告。

I want be able to count number of messages each friend sent. I want to be able to plot a histogram of the hours in which the messages where sent by each friend. Then, I want to do some word counting individually and as a group.

我希望能够计算每个朋友发送的消息数量。我希望能够绘制每个朋友发送消息的时间的直方图。然后,我想单独和作为一个组进行一些字数统计。

In Pandas I know how to do that. For example:

在 Pandas 中,我知道如何做到这一点。例如:

df = read_csv("messages.csv")
number_of_messages_friend1 = len(df[df.author_of_message == 'friend1']

How can I filter a csv file without Pandas?

如何在没有 Pandas 的情况下过滤 csv 文件?

回答by JonB

Since Pythonista does have numpy, you will want to look at recarrays, which are numpy's approach to this type of problem. The following worked out of the box in Pythonista for me:

由于 Pythonista 确实有 numpy,您将需要查看 recarrays,这是 numpy 解决此类问题的方法。以下在 Pythonista 中为我开箱即用:

import numpy as np
df=np.recfromcsv('messages.csv')
len(df[df.author_of_message==b'friend1'])

Depending on your data format, tou may find that recsfromcsv"just works", since it tries to guess data types, or you might need to customize things a bit. See genfromtextfor a number of options, such as explictly specifying data types or for using converters for converting string dates to datetime objects. recsfromcsvis just a convienece wrapper around genfromtext

根据您的数据格式,您可能会发现recsfromcsv“正常工作”,因为它会尝试猜测数据类型,或者您可能需要稍微自定义一下。有关genfromtext许多选项,例如明确指定数据类型或使用转换器将字符串日期转换为日期时间对象,请参阅 参考资料。recsfromcsv只是一个方便的包装genfromtext

https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#

https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#

Once in recarray, many of the simple indexing operations work the same as in pandas. Note you may need to do string compares using b-prefixed strings (bytes objects), unless you convert to unicode strings, as shown above.

一旦进入 recarray,许多简单的索引操作与在 Pandas 中的工作方式相同。请注意,您可能需要使用带 b 前缀的字符串(字节对象)进行字符串比较,除非您转换为 unicode 字符串,如上所示。

回答by Roland Smith

Use the csvmodule from the standard library to read the messages. You could store it into a list of collections.namedtuplefor easy access.

使用csv标准库中的模块读取消息。您可以将其存储到一个列表中collections.namedtuple以便于访问。

import csv

messages = []
with open('messages.csv') as csvfile:
    reader = csv.DictReader(csvfile, fieldnames=('day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'))
    for row in reader:
        messages.append(row)

That gives you all the messages as a list of dictionaries.

这为您提供了作为字典列表的所有消息。

Alternatively you could use a normal csv reader combined with a collections.namedtupleto make a list of named tuples, which are slightly easier to access.

或者,您可以使用普通的 csv 阅读器结合 acollections.namedtuple来制作命名元组的列表,这些列表更容易访问。

import csv
from collections import namedtuple

Msg = namedtuple('Msg', ('day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'))

messages = []
with open('messages.csv') as csvfile:
    msgreader = csv.reader(csvfile)
    for row in msgreader:
        messages.append(Msg(*row))

回答by elwarren

Pythonista now has competition on iOS. The pyto app provides python 3.8 with pandas. https://apps.apple.com/us/app/pyto-python-3-8

Pythonista 现在在 iOS 上有竞争。pyto 应用程序提供带有Pandas的 python 3.8。https://apps.apple.com/us/app/pyto-python-3-8