在 MySQL 表中查找所有只有空值的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12091272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find all those columns which have only null values, in a MySQL table
提问by Sherlock
The situation is as follows:
情况如下:
I have a substantial number of tables, with each a substantial number of columns. I need to deal with this old and to-be-deprecated database for a new system, and I'm looking for a way to eliminate all columns that have - apparently - never been in use.
我有大量的表格,每个表格都有大量的列。我需要为新系统处理这个旧的和即将弃用的数据库,我正在寻找一种方法来消除所有显然从未使用过的列。
I wanna do this by filtering out all columns that have a value on any given row, leaving me with a set of columns where the value is NULL in all rows. Of course I could manually sort every column descending, but that'd take too long as I'm dealing with loads of tables and columns. I estimate it to be 400 tables with up to 50 (!) columns per table.
我想通过过滤掉在任何给定行上具有值的所有列来做到这一点,留下一组列,其中所有行的值为 NULL。当然,我可以手动对每一列进行降序排序,但是因为我要处理大量表格和列,所以这会花费太长时间。我估计它是 400 个表,每个表最多有 50 (!) 个列。
Is there any way I can get this information from the information_schema?
有什么办法可以从 information_schema 中获取这些信息?
EDIT:
编辑:
Here's an example:
下面是一个例子:
column_a column_b column_c column_d
NULL NULL NULL 1
NULL 1 NULL 1
NULL 1 NULL NULL
NULL NULL NULL NULL
The output should be 'column_a' and 'column_c', for being the only columns without any filled in values.
输出应该是“column_a”和“column_c”,因为它们是唯一没有任何填充值的列。
采纳答案by eggyal
You can avoid using a procedure by dynamically creating (from the INFORMATION_SCHEMA.COLUMNS
table) a string that contains the SQL you wish to execute, then preparing a statementfrom that string and executing it.
您可以通过动态创建(从INFORMATION_SCHEMA.COLUMNS
表中)包含您希望执行的 SQL 的字符串,然后从该字符串准备语句并执行它来避免使用过程。
The SQL we wish to build will look like:
我们希望构建的 SQL 将如下所示:
SELECT * FROM (
SELECT 'tableA' AS `table`,
IF(COUNT(`column_a`), NULL, 'column_a') AS `column`
FROM tableA
UNION ALL
SELECT 'tableB' AS `table`,
IF(COUNT(`column_b`), NULL, 'column_b') AS `column`
FROM tableB
UNION ALL
-- etc.
) t WHERE `column` IS NOT NULL
This can be done using the following:
这可以使用以下方法完成:
SET group_concat_max_len = 4294967295; -- to overcome default 1KB limitation
SELECT CONCAT(
'SELECT * FROM ('
, GROUP_CONCAT(
'SELECT ', QUOTE(TABLE_NAME), ' AS `table`,'
, 'IF('
, 'COUNT(`', REPLACE(COLUMN_NAME, '`', '``'), '`),'
, 'NULL,'
, QUOTE(COLUMN_NAME)
, ') AS `column` '
, 'FROM `', REPLACE(TABLE_NAME, '`', '``'), '`'
SEPARATOR ' UNION ALL '
)
, ') t WHERE `column` IS NOT NULL'
)
INTO @sql
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = DATABASE();
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
See it on sqlfiddle.
回答by DhruvPathak
I am not an expert in SQL procedures, hence giving general idea using SQL queries and a PHP/python script.
我不是 SQL 程序方面的专家,因此给出了使用 SQL 查询和 PHP/python 脚本的总体思路。
use
SHOW TABLES
or some other query onINFORMATION_SCHEMA
database to get all tables in your databaseMY_DATABASE
do a query to generate a statement to get all column names in a particular table, this will be used in next query.
SHOW TABLES
对INFORMATION_SCHEMA
数据库使用或其他一些查询来获取数据库中的所有表MY_DATABASE
执行查询以生成语句以获取特定表中的所有列名,这将在下一个查询中使用。
SELECT Group_concat(Concat( "MAX(", column_name, ")" )) FROM information_schema.columns WHERE table_schema = 'MY_DATABSE' AND table_name = 'MY_TABLE' ORDER BY table_name,ordinal_position
SELECT Group_concat(Concat( "MAX(", column_name, ")" )) FROM information_schema.columns WHERE table_schema = 'MY_DATABSE' AND table_name = 'MY_TABLE' ORDER BY table_name,ordinal_position
You will get an output like
MAX(column_a),MAX(column_b),MAX(column_c),MAX(column_d)
Use this output to generate final query :
你会得到类似的输出
MAX(column_a),MAX(column_b),MAX(column_c),MAX(column_d)
使用此输出生成最终查询:
SELECT Max(column_a), Max(column_b), Max(column_c), Max(column_d) FROM MY_DATABASE.MY_TABLE
从 MY_DATABASE.MY_TABLE 中选择 Max(column_a)、Max(column_b)、Max(column_c)、Max(column_d)
The output would be :
输出将是:
MAX(column_a) MAX(column_b) MAX(column_c) MAX(column_d)
NULL 1 NULL 1
- All the columns with Max value as
NULL
are the ones which have all valuesNULL
- 具有最大值的所有列
NULL
都是具有所有值的列NULL
回答by georgepsarakis
You can take advantage of the behavior of COUNTaggregate function regarding NULLs. By passing the field as argument, the COUNTfunction returns the number of non-NULL values while COUNT(*)returns the total number of rows. Thus you can calculate the ratio of NULL to "acceptable" values.
您可以利用COUNT聚合函数关于 NULL的行为。通过将该字段作为参数传递,COUNT函数返回非 NULL 值的数量,而COUNT(*)返回总行数。因此,您可以计算 NULL 与“可接受”值的比率。
I will give an example with the following table structure:
我将给出一个具有以下表结构的示例:
CREATE TABLE `t1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`col_1` int(10) unsigned DEFAULT NULL,
`col_2` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ;
-- let's fill the table with random values
INSERT INTO t1(col_1,col_2) VALUES(1,2);
INSERT INTO t1(col_1,col_2)
SELECT
IF(RAND() > 0.5, NULL ,FLOOR(RAND()*1000),
IF(RAND() > 0.5, NULL ,FLOOR(RAND()*1000) FROM t1;
-- run the last INSERT-SELECT statement a few times
SELECT COUNT(col_1)/COUNT(*) AS col_1_ratio,
COUNT(col_2)/COUNT(*) AS col_2_ratio FROM t1;
You can write a function that automatically constructs a query from the INFORMATION_SCHEMA database by passing the table name as input variable. Here's how to obtain the structure data directly from INFORMATION_SCHEMA tables:
您可以编写一个函数,通过将表名作为输入变量传递,从 INFORMATION_SCHEMA 数据库自动构造查询。下面是如何直接从 INFORMATION_SCHEMA 表中获取结构数据:
SET @query:=CONCAT("SELECT @column_list:=GROUP_CONCAT(col) FROM (
SELECT CONCAT('COUNT(',c.COLUMN_NAME,')/COUNT(*)') AS col
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE NOT COLUMN_KEY IN('PRI') AND TABLE_SCHEMA=DATABASE()
AND TABLE_NAME='t1' ORDER BY ORDINAL_POSITION ) q");
PREPARE COLUMN_SELECT FROM @query;
EXECUTE COLUMN_SELECT;
SET @null_counters_sql := CONCAT('SELECT ',@column_list, ' FROM t1');
PREPARE NULL_COUNTERS FROM @null_counters_sql;
EXECUTE NULL_COUNTERS;
回答by Sami
I have created 4 tables. Three for demo and one nullcolumns
is the compulsory part of solution. Among three tables, only salary
and dept
have columns with all values null (you may have a look at their script).
我创建了 4 个表。三个用于演示,一个nullcolumns
是解决方案的必修部分。在三个表中,只有salary
和dept
具有所有值为空的列(您可以查看他们的脚本)。
The compulsory table and the procedure are given at the end
最后给出必填表和程序
You can copy paste and run (the compulsory part or all) as sql (just you have to change the delimiter to //) in your desired database on your localhost and then --- call get();
and see the results
您可以在本地主机上的所需数据库中复制粘贴并运行(强制部分或全部)作为 sql(只需将分隔符更改为 //),然后 ---call get();
并查看结果
CREATE TABLE IF NOT EXISTS `dept` (
`did` int(11) NOT NULL,
`dname` varchar(50) DEFAULT NULL,
PRIMARY KEY (`did`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `dept` (`did`, `dname`) VALUES
(1, NULL),
(2, NULL),
(3, NULL),
(4, NULL),
(5, NULL);
CREATE TABLE IF NOT EXISTS `emp` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ename` varchar(50) NOT NULL,
`did` int(11) NOT NULL,
PRIMARY KEY (`ename`),
KEY `deptid` (`did`),
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `emp` (`id`, `ename`, `did`) VALUES
(1, 'e1', 4),
(2, 'e2', 4),
(3, 'e3', 2),
(4, 'e4', 4),
(5, 'e5', 3);
CREATE TABLE IF NOT EXISTS `salary` (
`EmpCode` varchar(50) NOT NULL,
`Amount` int(11) DEFAULT NULL,
`Date` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `salary` (`EmpCode`, `Amount`, `Date`) VALUES
('1', 344, NULL),
('2', NULL, NULL);
------------------------------------------------------------------------
------------------------------------------------------------------------
CREATE TABLE IF NOT EXISTS `nullcolumns` (
`Table_Name` varchar(100) NOT NULL,
`Column_Name` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--Only one procedure Now
CREATE PROCEDURE get(dn varchar(100))
BEGIN
declare c1 int; declare b1 int default 0; declare tn varchar(30);
declare c2 int; declare b2 int; declare cn varchar(30);
select count(*) into c1 from information_schema.tables where table_schema=dn;
delete from nullcolumns;
while b1<c1 do
select table_name into tn from information_schema.tables where
table_schema=dn limit b1,1;
select count(*) into c2 from information_schema.columns where
table_schema=dn and table_name=tn;
set b2=0;
while b2<c2 do
select column_name into cn from information_schema.columns where
table_schema=dn and table_name=tn limit b2,1;
set @nor := 0;
set @query := concat("select count(*) into @nor from ", dn,".",tn);
prepare s1 from @query;
execute s1;deallocate prepare s1;
if @nor>0 then set @res := 0;
set @query := concat("select ((select max(",cn,") from ", dn,".",tn,")
is NULL) into @res");
prepare s1 from @query;
execute s1;deallocate prepare s1;
if @res=1 then
insert into nullcolumns values(tn,cn);
end if; end if;
set b2=b2+1;
end while;
set b1=b1+1;
end while;
select * from nullcolumns;
END;
You can easily execute stored procedure easily as sql in your phpmyadin 'as it is' just change the Delimiters (at the bottom of SQL quesry box) to // Then
您可以轻松地像 phpmyadin 中的 sql 一样轻松地执行存储过程,“原样”只需将分隔符(在 SQL 查询框的底部)更改为 // 然后
call get();
And Enjoy:)
并享受:)
You can see Now the table nullcolumns
showing all columns having 100/100 null values along with the table Names
您现在可以看到该表nullcolumns
显示了所有具有 100/100 空值的列以及表名称
In procedure code if @nor>0
restricts that no empty table should be included in results you can remove that restriction.
在过程代码if @nor>0
限制结果中不应包含空表,您可以删除该限制。
回答by davek
I think you can do this with GROUP_CONCAT and GROUP BY:
我认为你可以用 GROUP_CONCAT 和 GROUP BY 做到这一点:
select length(replace(GROUP_CONCAT(my_col), ',', ''))
from my_table
group by my_col
(untested)
(未经测试)
EDIT: the docs don't seem to state that GROUP_CONCAT needs a corresponding GROUP BY, so try this:
编辑:文档似乎没有说明 GROUP_CONCAT 需要相应的 GROUP BY,所以试试这个:
select
length(replace(GROUP_CONCAT(col_a), ',', '')) as len_a
, length(replace(GROUP_CONCAT(col_b), ',', '')) as len_b
, length(replace(GROUP_CONCAT(col_c), ',', '')) as Len_c
from my_table
回答by raghu gs
select column_name
from user_tab_columns
where table_name='Table_name' and num_nulls>=1;
Just by simple query you will get those two columns.
只需通过简单的查询,您就会得到这两列。