将 Pandas 列转换为逗号分隔的列表以在 sql 语句中使用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44169189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:41:38  来源:igfitidea点击:

Convert pandas columns to comma separated lists to be used in sql statements

csvpandasdataframepandasql

提问by runningbirds

I have a dataframe and I am trying to turn the column into a comma separated list. The end goal is to pass this comma seperated list as a list of filtered items in a SQL query.

我有一个数据框,我正在尝试将该列转换为逗号分隔的列表。最终目标是将此逗号分隔列表作为 SQL 查询中的筛选项列表传递。

How do I go about doing this?

我该怎么做?

> import pandas as pd
> 
> mydata = [{'id' : 'Hyman', 'b': 87, 'c': 1000},
>           {'id' : 'jill', 'b': 55, 'c':2000}, {'id' : 'july', 'b': 5555, 'c':22000}] 
  df = pd.DataFrame(mydata) 
  df

Expected solution - note the quotes around the ids since they are strings and the items in column titled 'b' since that is a numerical field and the way in which SQL works. I would then eventually send a query like

预期的解决方案 - 请注意 id 周围的引号,因为它们是字符串和标题为“b”的列中的项目,因为这是一个数字字段以及 SQL 的工作方式。然后我最终会发送一个查询

select * from mytable where ids in (my_ids)  or values in (my_values):

my_ids = 'Hyman', 'jill','july'

my_ids = 'Hyman', 'jill','july'

my_values = 87,55,5555

my_values = 87,55,5555

回答by Atihska

Though this is an old post, but I encountered a similar issue and solved it in one line using valuesand tolist()as

尽管这是一个老帖子,但我遇到了类似的问题,并使用解决它在同一行valuestolist()作为

df['col_name'].values.tolist()

So in your case, it will be

所以在你的情况下,它将是

my_ids = my_data['id'].values.tolist() # ['Hyman', 'jill', 'july']
my_values = my_data['b'].values.tolist()

回答by Scott Boston

Let's use applywith argument 'reduce=False' then check the dtype of the series and apply the proper argument to join:

让我们使用apply参数 'reduce=False' 然后检查系列的 dtype 并将正确的参数应用于join

df.apply(lambda x: ', '.join(x.astype(str)) if x.dtype=='int64' else ', '.join("\'"+x.astype(str)+"\'"), reduce=False)

Output:

输出:

b               87, 55, 5555
c          1000, 2000, 22000
id    'Hyman', 'jill', 'july'
dtype: object