- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive on tez cannot execute custom hook program!!!
Created 08-25-2023 09:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I developed a custom hook on CDH (6.2. X), implement the HiveDriverRunHook interface and ExecuteWithHookContext interface. It runs smoothly, but when I migrate the hook to CDP (7.1. X), I report an error and cannot execute it. I suspect it was caused by the Tez engine, but I cannot solve it and need help,thanks very much
Created 09-27-2023 07:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In CDP, when the HiveProtoLoggingHook is configured, query information is automatically captured and stored in the 'query_data' folder, which is typically located where 'hive.hook.proto.base-directory' is set. These details are saved as protobuf files, and in Hive, you can utilize the ProtobufMessageSerDe to access them.
To read this captured data, you can create a table as shown below.
CREATE EXTERNAL TABLE `query_data`(
`eventtype` string COMMENT 'from deserializer',
`hivequeryid` string COMMENT 'from deserializer',
`timestamp` bigint COMMENT 'from deserializer',
`executionmode` string COMMENT 'from deserializer',
`requestuser` string COMMENT 'from deserializer',
`queue` string COMMENT 'from deserializer',
`user` string COMMENT 'from deserializer',
`operationid` string COMMENT 'from deserializer',
`tableswritten` array<string> COMMENT 'from deserializer',
`tablesread` array<string> COMMENT 'from deserializer',
`otherinfo` map<string,string> COMMENT 'from deserializer')
PARTITIONED BY (
`date` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageSerDe'
WITH SERDEPROPERTIES (
'proto.class'='org.apache.hadoop.hive.ql.hooks.proto.HiveHookEvents$HiveHookEventProto',
'proto.maptypes'='org.apache.hadoop.hive.ql.hooks.proto.MapFieldEntry')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
LOCATION
'<query_datalocation>'
TBLPROPERTIES (
'bucketing_version'='2',
'proto.class'='org.apache.hadoop.hive.ql.hooks.proto.HiveHookEvents$HiveHookEventProto')
After creating the table, execute 'msck repair query_data sync partitions' to synchronize the partitions, and then you can retrieve and analyze the data using Beeline.
Created 08-28-2023 01:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@yucai, Welcome to our community! To help you get the best possible answer, I have tagged in our Hive experts @asish @tjangid who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 08-28-2023 01:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Custom hook is not supported by Cloudera.
Created 08-28-2023 08:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But I have implemented custom hooks on CDH, which means that custom hooks are no longer supported in CDP, right? Is there any other way for me to obtain information about hive without changing the source code? For example, SQL information or execution result information?
Created 08-28-2023 11:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @yucai , I have noticed your question, I think the Chinese PS team can help you. By the way, which company do you belong to? China UnionPay Data Services? Can you leave your contact information, we can have a quick call.
Created 08-29-2023 12:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I sent you a private message with my phone number. Thank you
Created 09-27-2023 07:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In CDP, when the HiveProtoLoggingHook is configured, query information is automatically captured and stored in the 'query_data' folder, which is typically located where 'hive.hook.proto.base-directory' is set. These details are saved as protobuf files, and in Hive, you can utilize the ProtobufMessageSerDe to access them.
To read this captured data, you can create a table as shown below.
CREATE EXTERNAL TABLE `query_data`(
`eventtype` string COMMENT 'from deserializer',
`hivequeryid` string COMMENT 'from deserializer',
`timestamp` bigint COMMENT 'from deserializer',
`executionmode` string COMMENT 'from deserializer',
`requestuser` string COMMENT 'from deserializer',
`queue` string COMMENT 'from deserializer',
`user` string COMMENT 'from deserializer',
`operationid` string COMMENT 'from deserializer',
`tableswritten` array<string> COMMENT 'from deserializer',
`tablesread` array<string> COMMENT 'from deserializer',
`otherinfo` map<string,string> COMMENT 'from deserializer')
PARTITIONED BY (
`date` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageSerDe'
WITH SERDEPROPERTIES (
'proto.class'='org.apache.hadoop.hive.ql.hooks.proto.HiveHookEvents$HiveHookEventProto',
'proto.maptypes'='org.apache.hadoop.hive.ql.hooks.proto.MapFieldEntry')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
LOCATION
'<query_datalocation>'
TBLPROPERTIES (
'bucketing_version'='2',
'proto.class'='org.apache.hadoop.hive.ql.hooks.proto.HiveHookEvents$HiveHookEventProto')
After creating the table, execute 'msck repair query_data sync partitions' to synchronize the partitions, and then you can retrieve and analyze the data using Beeline.
Created 10-03-2023 02:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@yucai Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.@
Regards,
Diana Torres,Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
