Member since
02-13-2019
12
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3534 | 02-19-2019 05:17 AM |
02-04-2021
03:54 AM
1 Kudo
Also make sure that the HDFS Data Node is running on that server.
... View more
12-06-2020
02:46 PM
I'm using Cloudera QuickStart VM 5.13 and I installed their Kafka version. Listing Kafka Topics: /usr/bin/kafka-topics --list --zookeeper quickstart.cloudera:2181 Creating Kafka Topic: /usr/bin/kafka-topics --create --zookeeper quickstart.cloudera:2181 --replication-factor 1 --partitions 3 --topic myFirstTopic Start a Producer: /usr/bin/kafka-console-producer --broker-list quickstart.cloudera:9092 --topic myFirstTopic Start a Consumer: /usr/bin/kafka-console-consumer --bootstrap-server quickstart.cloudera:9092 --topic myFirstTopic --from-beginning Notes for your issue: Replications need to be less than or equal to the number of brokers. I changed offsets.topic.replication.factor in Kafka configuration from Cloudera Manager and I set it to 1 (Cause I have 1 broker) You can delete brokers from Zookeeper as shown in the link below. Then restart Kafka to recreate these brokers. Zookeeper CLI: Accessing Zookeeper CLI: /usr/bin/zookeeper-client List All: ls / Output: [cluster, controller, brokers, zookeeper, admin, isr_change_notification, log_dir_event_notification, ngdata, controller_epoch, solr, consumers, latest_producer_id_block, config, hbase] List Kafka Brokers: ls /brokers Output: [ids, topics, seqid] List Kafka Topics in Zookeeper: ls /brokers/topics Output: [myFirstTopic, __consumer_offsets] Delete a Path in Zookeeper: rmr /brokers
... View more
10-22-2020
04:59 AM
I did as @ssubhas said, setting the attributes to false. spark.sql("SET hive.enforce.bucketing=false")
spark.sql("SET hive.enforce.sorting=false")
spark.sql("SET spark.hadoop.hive.exec.dynamic.partition = true")
spark.sql("SET spark.hadoop.hive.exec.dynamic.partition.mode = nonstrict")
newPartitionsDF.write.mode(SaveMode.Append).format("hive").insertInto(this.destinationDBdotTableName) Spark can create the bucketed table in Hive with no issues. Spark inserted the data into the table, but it totally ignored the fact that the table is bucketed. So when I open a partition, I see only 1 file. When inserting, we should set hive.enforce.bucketing = true, not false. And you will face the following error in Spark logs. org.apache.spark.sql.AnalysisException: Output Hive table `hive_test_db`.`test_bucketing` is bucketed but Spark currently does NOT populate bucketed output which is compatible with Hive.; This means that Spark doesn't support insertion into bucketed Hive tables. The first answer in this Stackoverflow question, explains that what @ssubhas suggested is a workaround that doesn't guarantee bucketing.
... View more
06-28-2020
07:21 AM
I tried your query, but if the table has no comment, it produces a duplicate record for that table. So I modified it a bit mysql -u hive -p
<ENTER YOUR HIVE PASSWORD>
use metastore;
SELECT * FROM (SELECT DBS.NAME AS OWNER, TBLS.TBL_NAME as OBJECT_NAME, TBL_COMMENTS.TBL_COMMENT as OBJECT_DESCRIPTION, TBLS.TBL_ID as OBJECT_ID, TBLS.TBL_TYPE as OBJECT_TYPE, "VALID" as OBJECT_STATUS,COLUMNS_V2.COLUMN_NAME, COLUMNS_V2.COMMENT as COLUMN_DESCRIPTION, COLUMNS_V2.TYPE_NAME AS DATA_TYPE FROM DBS JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID JOIN SDS ON TBLS.SD_ID = SDS.SD_ID JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID JOIN ( SELECT DISTINCT TBL_ID, TBL_COMMENT FROM ( SELECT TBLS.TBL_ID TBL_ID,TABLE_PARAMS.PARAM_KEY,TABLE_PARAMS.PARAM_VALUE, TABLE_PARAMS.PARAM_VALUE as TBL_COMMENT FROM TBLS JOIN TABLE_PARAMS ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID WHERE TABLE_PARAMS.PARAM_KEY = "comment" UNION ALL SELECT TBLS.TBL_ID TBL_ID,TABLE_PARAMS.PARAM_KEY,TABLE_PARAMS.PARAM_VALUE, "" as TBL_COMMENT FROM TBLS JOIN TABLE_PARAMS ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID WHERE TABLE_PARAMS.PARAM_KEY <> "comment" AND TBLS.TBL_ID NOT IN (SELECT TBL_ID FROM TABLE_PARAMS WHERE TABLE_PARAMS.PARAM_KEY = "comment") ) TBL_COMMENTS_INTERNAL) TBL_COMMENTS ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID) as view WHERE OWNER = "database_name_goes_here" AND OBJECT_NAME = "table_name_goes_here";
... View more
02-19-2019
05:17 AM
Back So far it's working fine. I also found the problem with the writing in the file. The garbage data between the interceptor and the message can contain literally anything. It contained \n which is the LF in Linux systems. This was causing the Kafka problem as well. Kafka see the \n and it assumes that the message is 2 messages, not 1, that's why when I changed the delimiter to \r\n it assumed the message to be 1 message. That's a good conclusion I guess. If you want to write in a file or apply a regex on it, then just replace \n and \r with an empty string so you don't bother with those annoying control characters. Thanks to whoever wanted to help me.
... View more