pandas 使用 Python 的 Hive UDF
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24293843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hive UDF with Python
提问by user3476463
I'm new to python, pandas, and hive and would definitely appreciate some tips.
我是 python、pandas 和 hive 的新手,肯定会感谢一些提示。
I have the python code below, which I would like to turn into a UDF in hive. Only instead of taking a csv as the input, doing the transformations and then exporting another csv, I would like to take a hive table as the input, and then export the results as a new hive table containing the transformed data.
我有下面的 python 代码,我想把它变成 hive 中的 UDF。只是不是将 csv 作为输入,进行转换然后导出另一个 csv,我想将 hive 表作为输入,然后将结果导出为包含转换后数据的新 hive 表。
Python Code:
蟒蛇代码:
import pandas as pd
data = pd.read_csv('Input.csv')
df = data
df = df.set_index(['Field1','Field2'])
Dummies=pd.get_dummies(df['Field3']).reset_index()
df2=Dummies.drop_duplicates()
df3=df2.groupby(['Field1','Field2']).sum()
df3.to_csv('Output.csv')

