java Hive,如何检索数据库的所有表列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29239565/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hive, how do I retrieve all the database's tables columns
提问by Erij Tounsi
I want to write the equivalent of this sql request in Hive :
我想在 Hive 中编写与此 sql 请求等效的内容:
select * from information_schema.columns where table_schema='database_name'
How can I access hive's metastore and retrieve all the columns of all the tables stored in a specific database? I know that we can do it by table via describe [table_name] but is there anyway to have all the columns for all the tables in a database in the same request?
如何访问 hive 的 Metastore 并检索存储在特定数据库中的所有表的所有列?我知道我们可以通过describe [table_name] 按表来完成,但是无论如何要在同一个请求中包含数据库中所有表的所有列?
回答by Angelo Di Donato
If you want to have the ability to run such queries that return hive metadata, you can setup Hive metastore with MySQL, metadata used in Hive is stored in a specific account of MySQL.
如果您希望能够运行此类返回 hive 元数据的查询,您可以使用 MySQL 设置 Hive Metastore,Hive 中使用的元数据存储在 MySQL 的特定帐户中。
You will have to create a user of MySQL for hive by doing CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword'
.
您必须通过执行CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword'
.
Then you will find tables like COLUMNS_VS
with the info you are looking for.
然后您会找到COLUMNS_VS
与您要查找的信息类似的表格。
An example query to retrieve all columns in all tables could be: SELECT COLUMN_NAME, TBL_NAME FROM COLUMNS_V2 c JOIN TBLS a ON c.CD_ID=a.TBL_ID
检索所有表中所有列的示例查询可以是: SELECT COLUMN_NAME, TBL_NAME FROM COLUMNS_V2 c JOIN TBLS a ON c.CD_ID=a.TBL_ID
Alternatively, you can access this information via REST calls to WebHCatsee wikifor more info.
或者,您可以通过对WebHCat 的REST 调用访问此信息,请参阅wiki了解更多信息。
回答by Ram Ghadiyaram
How can I access hive's metastore and retrieve all the columns of all the tables stored in a specific database?
如何访问 hive 的 Metastore 并检索存储在特定数据库中的所有表的所有列?
This is one way to connect HiveMetaStoreClient
and you can use method getTableColumnsInformation will get columns.
这是一种连接方式HiveMetaStoreClient
,您可以使用 getTableColumnsInformation 方法获取列。
In this class along with columns all the other information like partitions can be extracted. pls see example client and sample methods.
在这个类和列中,可以提取所有其他信息,如分区。请参阅示例客户端和示例方法。
import org.apache.hadoop.hive.conf.HiveConf;
// test program
public class Test {
public static void main(String[] args){
HiveConf hiveConf = new HiveConf();
hiveConf.setIntVar(HiveConf.ConfVars.METASTORETHRIFTCONNECTIONRETRIES, 3);
hiveConf.setVar(HiveConf.ConfVars.METASTOREURIS, "thrift://host:port");
HiveMetaStoreConnector hiveMetaStoreConnector = new HiveMetaStoreConnector(hiveConf);
if(hiveMetaStoreConnector != null){
System.out.print(hiveMetaStoreConnector.getAllPartitionInfo("tablename"));
}
}
}
// define a class like this
import com.google.common.base.Joiner;
import com.google.common.collect.Lists;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
import org.apache.hadoop.hive.metastore.api.FieldSchema;
import org.apache.hadoop.hive.metastore.api.MetaException;
import org.apache.hadoop.hive.metastore.api.Partition;
import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants;
import org.apache.hadoop.hive.ql.metadata.Hive;
import org.apache.thrift.TException;
import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class HiveMetaStoreConnector {
private HiveConf hiveConf;
HiveMetaStoreClient hiveMetaStoreClient;
public HiveMetaStoreConnector(String msAddr, String msPort){
try {
hiveConf = new HiveConf();
hiveConf.setVar(HiveConf.ConfVars.METASTOREURIS, msAddr+":"+ msPort);
hiveMetaStoreClient = new HiveMetaStoreClient(hiveConf);
} catch (MetaException e) {
e.printStackTrace();
System.err.println("Constructor error");
System.err.println(e.toString());
System.exit(-100);
}
}
public HiveMetaStoreConnector(HiveConf hiveConf){
try {
this.hiveConf = hiveConf;
hiveMetaStoreClient = new HiveMetaStoreClient(hiveConf);
} catch (MetaException e) {
e.printStackTrace();
System.err.println("Constructor error");
System.err.println(e.toString());
System.exit(-100);
}
}
public String getAllPartitionInfo(String dbName){
List<String> res = Lists.newArrayList();
try {
List<String> tableList = hiveMetaStoreClient.getAllTables(dbName);
for(String tableName:tableList){
res.addAll(getTablePartitionInformation(dbName,tableName));
}
} catch (MetaException e) {
e.printStackTrace();
System.out.println("getAllTableStatistic error");
System.out.println(e.toString());
System.exit(-100);
}
return Joiner.on("\n").join(res);
}
public List<String> getTablePartitionInformation(String dbName, String tableName){
List<String> partitionsInfo = Lists.newArrayList();
try {
List<String> partitionNames = hiveMetaStoreClient.listPartitionNames(dbName,tableName, (short) 10000);
List<Partition> partitions = hiveMetaStoreClient.listPartitions(dbName,tableName, (short) 10000);
for(Partition partition:partitions){
StringBuffer sb = new StringBuffer();
sb.append(tableName);
sb.append("\t");
List<String> partitionValues = partition.getValues();
if(partitionValues.size()<4){
int size = partitionValues.size();
for(int j=0; j<4-size;j++){
partitionValues.add("null");
}
}
sb.append(Joiner.on("\t").join(partitionValues));
sb.append("\t");
DateTime createDate = new DateTime((long)partition.getCreateTime()*1000);
sb.append(createDate.toString("yyyy-MM-dd HH:mm:ss"));
partitionsInfo.add(sb.toString());
}
} catch (TException e) {
e.printStackTrace();
return Arrays.asList(new String[]{"error for request on" + tableName});
}
return partitionsInfo;
}
public String getAllTableStatistic(String dbName){
List<String> res = Lists.newArrayList();
try {
List<String> tableList = hiveMetaStoreClient.getAllTables(dbName);
for(String tableName:tableList){
res.addAll(getTableColumnsInformation(dbName,tableName));
}
} catch (MetaException e) {
e.printStackTrace();
System.out.println("getAllTableStatistic error");
System.out.println(e.toString());
System.exit(-100);
}
return Joiner.on("\n").join(res);
}
public List<String> getTableColumnsInformation(String dbName, String tableName){
try {
List<FieldSchema> fields = hiveMetaStoreClient.getFields(dbName, tableName);
List<String> infs = Lists.newArrayList();
int cnt = 0;
for(FieldSchema fs : fields){
StringBuffer sb = new StringBuffer();
sb.append(tableName);
sb.append("\t");
sb.append(cnt);
sb.append("\t");
cnt++;
sb.append(fs.getName());
sb.append("\t");
sb.append(fs.getType());
sb.append("\t");
sb.append(fs.getComment());
infs.add(sb.toString());
}
return infs;
} catch (TException e) {
e.printStackTrace();
System.out.println("getTableColumnsInformation error");
System.out.println(e.toString());
System.exit(-100);
return null;
}
}
}