Member since
02-01-2019
650
Posts
143
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1654 | 04-01-2019 09:53 AM | |
923 | 04-01-2019 09:34 AM | |
3663 | 01-28-2019 03:50 PM | |
872 | 11-08-2018 09:26 AM | |
2453 | 11-08-2018 08:55 AM |
04-04-2019
03:56 PM
@geniusbaibai geniusbaibai Starting HDP3.0 spark cannot directly access hive. you'd need to use HWC (Hive Warehouse connector) https://community.hortonworks.com/articles/223626/integrating-apache-hive-with-apache-spark-hive-war.html
... View more
04-01-2019
09:53 AM
1 Kudo
@Michael Bronson, Permission issue 🙂 Either run this command with hdfs user or change the ownership of /benchmarks/TestDFSIO to root. java.io.IOException: Permission denied: user=root, access=WRITE, inode="/benchmarks/TestDFSIO/io_control/in_file_test_io_0":hdfs:hdfs:drwxr-xr-x
... View more
04-01-2019
09:50 AM
1 Kudo
@ram sriram the error says "java.lang.OutOfMemoryError: Java heap space" Do evaluate your driver/executor memory and increase them accordingly.
... View more
04-01-2019
09:49 AM
@Jordy Andreas Looks like you don't have the flume twitter jar in flume's classpath. You can get the jar from : https://github.com/cloudera/cdh-twitter-example and place it in flume's classpath.
... View more
04-01-2019
09:34 AM
@Sampath Kumar, Please refer this article : https://community.hortonworks.com/articles/217295/ambari-270-how-to-reset-ambari-admin-password-from.html
... View more
02-07-2019
09:10 AM
Create Kafka topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper `hostname`:2181 --replication-factor 1 --partitions 1 --topic kafka_hive_topic
Create Hive table. (update the Kafka broker hostname below)
CREATE EXTERNAL TABLE kafka_hive_table
(`Country Name` string , `Language` string, `_id` struct<`$oid`:string>)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES
("kafka.topic" = "kafka_hive_topic", "kafka.bootstrap.servers"="c2114-node2.labs.com:6667");
Download the sample json data.
wget -O countries.json https://github.com/ozlerhakan/mongodb-json-files/blob/master/datasets/countries.json?raw=true
Produce data into Kafka topic.
cat countries.json | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list c2114-node2.-labs.com:6667 --topic kafka_hive_topic
Describe table (to see additional Kafka specific columns)
describe kafka_hive_table;
+---------------+----------------------+--------------------+
| col_name | data_type | comment |
+---------------+----------------------+--------------------+
| country name | string | from deserializer |
| language | string | from deserializer |
| _id | struct<$oid:string> | from deserializer |
| __key | binary | from deserializer |
| __partition | int | from deserializer |
| __offset | bigint | from deserializer |
| __timestamp | bigint | from deserializer |
+---------------+----------------------+--------------------+
Run some sample queries.
SELECT count(*) from kafka_hive_table;
+--------+
| _c0 |
+--------+
| 21640 |
+--------+
SELECT `__partition`, max(`__offset`), CURRENT_TIMESTAMP FROM kafka_hive_table GROUP BY `__partition`, CURRENT_TIMESTAMP;
+--------------+--------+--------------------------+
| __partition | _c1 | _c2 |
+--------------+--------+--------------------------+
| 0 | 21639 | 2019-02-07 08:49:50.918 |
+--------------+--------+--------------------------+
select * from kafka_hive_table limit 10;
+--------------------------------+----------------------------+--------------------------------------+-------------------------+-------------------------------+----------------------------+-------------------------------+
| kafka_hive_table.country name | kafka_hive_table.language | kafka_hive_table._id | kafka_hive_table.__key | kafka_hive_table.__partition | kafka_hive_table.__offset | kafka_hive_table.__timestamp |
+--------------------------------+----------------------------+--------------------------------------+-------------------------+-------------------------------+----------------------------+-------------------------------+
| Afrika | af | {"$oid":"55a0f1d420a4d760b5fbdbd6"} | NULL | 0 | 0 | 1549529251002 |
| Oseanië | af | {"$oid":"55a0f1d420a4d760b5fbdbd7"} | NULL | 0 | 1 | 1549529251010 |
| Suid-Amerika | af | {"$oid":"55a0f1d420a4d760b5fbdbd8"} | NULL | 0 | 2 | 1549529251010 |
| Wêreld | af | {"$oid":"55a0f1d420a4d760b5fbdbd9"} | NULL | 0 | 3 | 1549529251011 |
| አፍሪካ | am | {"$oid":"55a0f1d420a4d760b5fbdbda"} | NULL | 0 | 4 | 1549529251011 |
| ኦሽኒያ | am | {"$oid":"55a0f1d420a4d760b5fbdbdb"} | NULL | 0 | 5 | 1549529251011 |
| ዓለም | am | {"$oid":"55a0f1d420a4d760b5fbdbdc"} | NULL | 0 | 6 | 1549529251011 |
| ደቡባዊ አሜሪካ | am | {"$oid":"55a0f1d420a4d760b5fbdbdd"} | NULL | 0 | 7 | 1549529251011 |
| أمريكا الجنوبية | ar | {"$oid":"55a0f1d420a4d760b5fbdbde"} | NULL | 0 | 8 | 1549529251011 |
| أمريكا الشمالية | ar | {"$oid":"55a0f1d420a4d760b5fbdbdf"} | NULL | 0 | 9 | 1549529251011 |
+--------------------------------+----------------------------+--------------------------------------+-------------------------+-------------------------------+----------------------------+-------------------------------+
... View more
- Find more articles tagged with:
- Data Processing
- Hive
- hive-kafka
- How-ToTutorial
- Kafka
Labels:
01-28-2019
04:33 PM
Seems to be the same script which i mentioned above. Isn't it?
... View more
01-28-2019
03:50 PM
1 Kudo
@Marcel-Jan Krijgsman Do run the /usr/hdp/current/atlas-server/hook-bin/import-hive.sh utility which imports the existing hive tables into atlas.
... View more
01-22-2019
06:29 PM
Good one @Jagatheesh Ramakrishnan
... View more
01-11-2019
09:23 AM
@rajendra you can use the below sql statement in the metastore database. mysql -u root -e "use hive;SELECT NAME, TBL_NAME FROM DBS as a, TBLS as b where a.DB_ID=b.DB_ID;"> tables.txt tables.txt will have the list of all tables.
... View more
01-10-2019
11:30 AM
@natus, In 2.x HDP you only need to set ("spark.sql.hive.llap", "true") and create a spark session. Refer the below examples: https://github.com/hortonworks-spark/spark-llap/blob/branch-2.3/examples/src/main/python/spark_llap_dsl.py https://github.com/hortonworks-spark/spark-llap/blob/branch-2.3/examples/src/main/python/spark_llap_sql.py
... View more
01-10-2019
08:44 AM
@Hamza Khribi Structured Streaming is supported in HDP-3.0+, with the exception of continuous processing. Continuous processing is an experimental streaming execution mode that is not currently supported. Prior to this it is a TP in HDP. Ref: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/spark-overview/content/analyzing_data_with_apache_spark.html
... View more
01-10-2019
08:33 AM
@natus Please refer this article : https://community.hortonworks.com/articles/72454/apache-spark-fine-grain-security-with-llap-test-dr.html HiveWarehouseBuilder is only available from HDP3.x. Hope this helps.
... View more
01-10-2019
08:10 AM
@A C, separator for --driver-class-path will be ":" instead of ",". Do update the command and rerun the pyspark shell. let know if this helps.
... View more
12-30-2018
12:28 PM
@Bmwer Bmwer Oozie spark action will use the same resources as spark-submit command. Additionally oozie runs a launcher job which internally submits the job. You may want to compare both the runs and see where exactly the job is taking time and try to mitigate that.
... View more
12-30-2018
06:38 AM
@hema moger, Do accept this answer and close this thread if it helped in addressing your query.
... View more
12-30-2018
06:37 AM
@Nilesh Do accept this answer and close this thread if it helped in addressing your query.
... View more
12-28-2018
02:54 PM
@Anjali Shevadkar There isn't Spark + Ranger support as of now. If you want to apply ranger policies you'd need to use Spark LLAP. Ref: https://hortonworks.com/blog/sparksql-ranger-llap-via-spark-thrift-server-bi-scenarios-provide-row-column-level-security-masking/ https://community.hortonworks.com/articles/72454/apache-spark-fine-grain-security-with-llap-test-dr.html Hope this helps.
... View more
12-28-2018
02:52 PM
1 Kudo
@Teja sai tarun Your client should be able to connect to zookeeper servers (irrespective of which network) then only you can connect and query from PQS.
... View more
12-15-2018
04:25 PM
@Michael Mester My HDP-3.1.0.0-78 cluster shows the right Kafka version 2.0.0. Looks like your installation is bugged, do verify the repos which are configured.
... View more
12-15-2018
03:55 PM
@Nilesh, Pls refer : https://issues.apache.org/jira/browse/HDFS-107 TLDR; the option to format datanode dirs was not added as it is risky and will cause data loss (if miss used). Hope this helps.
... View more
11-16-2018
06:53 PM
1 Kudo
@Igor Grinkin This doc has all the required ports listed: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_reference/content/reference_chap2.html
... View more
11-15-2018
04:16 AM
@Mayank Bhatt do add "--protocol https --port 8443" options if ambari is running on HTTPS.
... View more
11-13-2018
06:06 AM
@hema moger This would be a sample code to covert csv to json using pyspark. df = spark.read.format("CSV").option("header","true").load("file:///tmp/sample.csv")
df.repartition(1).toJSON(use_unicode=True).saveAsTextFile("file:///tmp/sample_out") Hope this helps.
... View more
11-09-2018
12:44 PM
@Clément Dumont I just tired it, looks like you have one jar from apache which is not recognising the security.protocol property. Below is what i used with the consumer.py you provided. You can download the dependent jars from : http://repo.hortonworks.com/content/repositories/releases/org/apache/spark /usr/hdp/2.6.4.0-91/spark2/bin/spark-submit --files spark_jaas.conf,kafka.service.keytab --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=spark_jaas.conf" --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=spark_jaas.conf" --jars spark-streaming_2.11-2.2.0.2.6.4.0-91.jar,spark-streaming-kafka-0-8-assembly_2.11-2.2.0.2.6.4.0-91.jar,spark-streaming-kafka-0-10_2.11-2.2.0.2.6.4.0-91.jar consumer.py Hope this helps.
... View more
11-08-2018
09:47 AM
@Clément Dumont You also need to pass the keytab in '--files' option so that the jaas conf will use that keytab and connect with kafka.
... View more
11-08-2018
09:26 AM
1 Kudo
@vasu arikatla AFAIK distcp still uses MapReduce in HDP 3.0.
... View more
11-08-2018
08:55 AM
@Mujeeb This error is due to a bug: "/usr/hdp/current/hive-server2/conf_llap//hive-env.sh: line 43: [: !=: unary operator expected" Please refer: https://community.hortonworks.com/content/supportkb/225891/errorusrhdp3000-1634hiveconfhive-envsh-line-50-una.html (looks like line numbers are different in article and the error you have, update accordingly) Hope this helps.
... View more
11-08-2018
08:50 AM
@Harjit
Singh
Ranger cannot perform this tasks. Apache Falcon is the one which can do work (however Falcon is deprecated as of HDP2.6.5 and DLM has these features now): Ref: https://falcon.apache.org/FalconDocumentation.html#Retention https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-movement-and-integration/content/ch_config_features_properties.html Hope this helps.
... View more
11-08-2018
08:41 AM
@Shantanu
Sharma
If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more