如何在没有 Pandas 的情况下过滤 CSV 文件？（Pythonista 中 Pandas 的最佳替代品）

Question

提问by zeh

I am trying to do some data analysis on Pythonista 3 (iOS app for python), however because of the C libraries of pandas it does not compile in the iOS device.

我正在尝试对 Pythonista 3（适用于 Python 的 iOS 应用程序）进行一些数据分析，但是由于 Pandas 的 C 库，它无法在 iOS 设备中编译。

Is there any substitute for Pandas? Would numpy be an option for data of type string?

有什么可以代替Pandas的吗？numpy 会是类型数据的选项string吗？

The data set I have at the moment is the history of messages between my friends and I.

我目前拥有的数据集是我和朋友之间的消息历史记录。

The whole history is in one csv file. Each row has the columns 'day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'

整个历史记录在一个 csv 文件中。每行都有列“day_of_the_week”、“date”、“time_of_message”、“author_of_message”、“message_body”

The goal of the analysis is to produce a report of our chat for the past year.

分析的目标是生成我们过去一年的聊天报告。

I want be able to count number of messages each friend sent. I want to be able to plot a histogram of the hours in which the messages where sent by each friend. Then, I want to do some word counting individually and as a group.

我希望能够计算每个朋友发送的消息数量。我希望能够绘制每个朋友发送消息的时间的直方图。然后，我想单独和作为一个组进行一些字数统计。

In Pandas I know how to do that. For example:

在 Pandas 中，我知道如何做到这一点。例如：

df = read_csv("messages.csv")
number_of_messages_friend1 = len(df[df.author_of_message == 'friend1']

How can I filter a csv file without Pandas?

如何在没有 Pandas 的情况下过滤 csv 文件？

Answer 1

回答by JonB

Since Pythonista does have numpy, you will want to look at recarrays, which are numpy's approach to this type of problem. The following worked out of the box in Pythonista for me:

由于 Pythonista 确实有 numpy，您将需要查看 recarrays，这是 numpy 解决此类问题的方法。以下在 Pythonista 中为我开箱即用：

import numpy as np
df=np.recfromcsv('messages.csv')
len(df[df.author_of_message==b'friend1'])

Depending on your data format, tou may find that recsfromcsv"just works", since it tries to guess data types, or you might need to customize things a bit. See genfromtextfor a number of options, such as explictly specifying data types or for using converters for converting string dates to datetime objects. recsfromcsvis just a convienece wrapper around genfromtext

根据您的数据格式，您可能会发现recsfromcsv“正常工作”，因为它会尝试猜测数据类型，或者您可能需要稍微自定义一下。有关genfromtext许多选项，例如明确指定数据类型或使用转换器将字符串日期转换为日期时间对象，请参阅参考资料。recsfromcsv只是一个方便的包装genfromtext

https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#

Once in recarray, many of the simple indexing operations work the same as in pandas. Note you may need to do string compares using b-prefixed strings (bytes objects), unless you convert to unicode strings, as shown above.

一旦进入 recarray，许多简单的索引操作与在 Pandas 中的工作方式相同。请注意，您可能需要使用带 b 前缀的字符串（字节对象）进行字符串比较，除非您转换为 unicode 字符串，如上所示。

Answer 2

回答by Roland Smith

Use the csvmodule from the standard library to read the messages. You could store it into a list of collections.namedtuplefor easy access.

使用csv标准库中的模块读取消息。您可以将其存储到一个列表中collections.namedtuple以便于访问。

import csv

messages = []
with open('messages.csv') as csvfile:
    reader = csv.DictReader(csvfile, fieldnames=('day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'))
    for row in reader:
        messages.append(row)

That gives you all the messages as a list of dictionaries.

这为您提供了作为字典列表的所有消息。

Alternatively you could use a normal csv reader combined with a collections.namedtupleto make a list of named tuples, which are slightly easier to access.

或者，您可以使用普通的 csv 阅读器结合 acollections.namedtuple来制作命名元组的列表，这些列表更容易访问。

import csv
from collections import namedtuple

Msg = namedtuple('Msg', ('day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'))

messages = []
with open('messages.csv') as csvfile:
    msgreader = csv.reader(csvfile)
    for row in msgreader:
        messages.append(Msg(*row))

Answer 3

回答by elwarren

Pythonista now has competition on iOS. The pyto app provides python 3.8 with pandas. https://apps.apple.com/us/app/pyto-python-3-8

Pythonista 现在在 iOS 上有竞争。pyto 应用程序提供带有Pandas的 python 3.8。https://apps.apple.com/us/app/pyto-python-3-8

如何在没有 Pandas 的情况下过滤 CSV 文件？（Pythonista 中 Pandas 的最佳替代品）

提问by zeh

回答by JonB

回答by Roland Smith

回答by elwarren

相关推荐

最近更新

标签

如何在没有 Pandas 的情况下过滤 CSV 文件？（Pythonista 中 Pandas 的最佳替代品）

提问by zeh

回答by JonB

回答by Roland Smith

回答by elwarren

相关推荐

pandas 如何在熊猫数据框中尽可能用 0 替换空单元格并将字符串更改为整数？

pandas 在熊猫中合并多索引数据框

pandas 熊猫，将多列的多个功能应用于 groupby 对象

Pandas Dataframe 到 HTML 删除索引

相关推荐

最近更新

标签