pandas 使用 Python 的 Hive UDF

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24293843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:10:25  来源:igfitidea点击:

Hive UDF with Python

pythonhadooppandashive

提问by user3476463

I'm new to python, pandas, and hive and would definitely appreciate some tips.

我是 python、pandas 和 hive 的新手,肯定会感谢一些提示。

I have the python code below, which I would like to turn into a UDF in hive. Only instead of taking a csv as the input, doing the transformations and then exporting another csv, I would like to take a hive table as the input, and then export the results as a new hive table containing the transformed data.

我有下面的 python 代码,我想把它变成 hive 中的 UDF。只是不是将 csv 作为输入,进行转换然后导出另一个 csv,我想将 hive 表作为输入,然后将结果导出为包含转换后数据的新 hive 表。

Python Code:

蟒蛇代码:

import pandas as pd
data = pd.read_csv('Input.csv')
df = data
df = df.set_index(['Field1','Field2'])
Dummies=pd.get_dummies(df['Field3']).reset_index()
df2=Dummies.drop_duplicates()
df3=df2.groupby(['Field1','Field2']).sum()
df3.to_csv('Output.csv')

回答by visakh

You can make use of the TRANSFORMfunction to make use of a UDF written in Python. The detailed steps are outlined hereand here.

您可以使用该TRANSFORM函数来使用 Python 编写的 UDF。此处此处概述详细步骤。