Member since
02-01-2019
650
Posts
143
Kudos Received
117
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3869 | 04-01-2019 09:53 AM | |
| 2065 | 04-01-2019 09:34 AM | |
| 10088 | 01-28-2019 03:50 PM |
04-01-2019
09:53 AM
1 Kudo
@Michael Bronson, Permission issue 🙂 Either run this command with hdfs user or change the ownership of /benchmarks/TestDFSIO to root. java.io.IOException: Permission denied: user=root, access=WRITE, inode="/benchmarks/TestDFSIO/io_control/in_file_test_io_0":hdfs:hdfs:drwxr-xr-x
... View more
04-01-2019
09:34 AM
@Sampath Kumar, Please refer this article : https://community.hortonworks.com/articles/217295/ambari-270-how-to-reset-ambari-admin-password-from.html
... View more
02-07-2019
09:10 AM
Create Kafka topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper `hostname`:2181 --replication-factor 1 --partitions 1 --topic kafka_hive_topic
Create Hive table. (update the Kafka broker hostname below)
CREATE EXTERNAL TABLE kafka_hive_table
(`Country Name` string , `Language` string, `_id` struct<`$oid`:string>)
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES
("kafka.topic" = "kafka_hive_topic", "kafka.bootstrap.servers"="c2114-node2.labs.com:6667");
Download the sample json data.
wget -O countries.json https://github.com/ozlerhakan/mongodb-json-files/blob/master/datasets/countries.json?raw=true
Produce data into Kafka topic.
cat countries.json | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list c2114-node2.-labs.com:6667 --topic kafka_hive_topic
Describe table (to see additional Kafka specific columns)
describe kafka_hive_table;
+---------------+----------------------+--------------------+
| col_name | data_type | comment |
+---------------+----------------------+--------------------+
| country name | string | from deserializer |
| language | string | from deserializer |
| _id | struct<$oid:string> | from deserializer |
| __key | binary | from deserializer |
| __partition | int | from deserializer |
| __offset | bigint | from deserializer |
| __timestamp | bigint | from deserializer |
+---------------+----------------------+--------------------+
Run some sample queries.
SELECT count(*) from kafka_hive_table;
+--------+
| _c0 |
+--------+
| 21640 |
+--------+
SELECT `__partition`, max(`__offset`), CURRENT_TIMESTAMP FROM kafka_hive_table GROUP BY `__partition`, CURRENT_TIMESTAMP;
+--------------+--------+--------------------------+
| __partition | _c1 | _c2 |
+--------------+--------+--------------------------+
| 0 | 21639 | 2019-02-07 08:49:50.918 |
+--------------+--------+--------------------------+
select * from kafka_hive_table limit 10;
+--------------------------------+----------------------------+--------------------------------------+-------------------------+-------------------------------+----------------------------+-------------------------------+
| kafka_hive_table.country name | kafka_hive_table.language | kafka_hive_table._id | kafka_hive_table.__key | kafka_hive_table.__partition | kafka_hive_table.__offset | kafka_hive_table.__timestamp |
+--------------------------------+----------------------------+--------------------------------------+-------------------------+-------------------------------+----------------------------+-------------------------------+
| Afrika | af | {"$oid":"55a0f1d420a4d760b5fbdbd6"} | NULL | 0 | 0 | 1549529251002 |
| Oseanië | af | {"$oid":"55a0f1d420a4d760b5fbdbd7"} | NULL | 0 | 1 | 1549529251010 |
| Suid-Amerika | af | {"$oid":"55a0f1d420a4d760b5fbdbd8"} | NULL | 0 | 2 | 1549529251010 |
| Wêreld | af | {"$oid":"55a0f1d420a4d760b5fbdbd9"} | NULL | 0 | 3 | 1549529251011 |
| አፍሪካ | am | {"$oid":"55a0f1d420a4d760b5fbdbda"} | NULL | 0 | 4 | 1549529251011 |
| ኦሽኒያ | am | {"$oid":"55a0f1d420a4d760b5fbdbdb"} | NULL | 0 | 5 | 1549529251011 |
| ዓለም | am | {"$oid":"55a0f1d420a4d760b5fbdbdc"} | NULL | 0 | 6 | 1549529251011 |
| ደቡባዊ አሜሪካ | am | {"$oid":"55a0f1d420a4d760b5fbdbdd"} | NULL | 0 | 7 | 1549529251011 |
| أمريكا الجنوبية | ar | {"$oid":"55a0f1d420a4d760b5fbdbde"} | NULL | 0 | 8 | 1549529251011 |
| أمريكا الشمالية | ar | {"$oid":"55a0f1d420a4d760b5fbdbdf"} | NULL | 0 | 9 | 1549529251011 |
+--------------------------------+----------------------------+--------------------------------------+-------------------------+-------------------------------+----------------------------+-------------------------------+
... View more
Labels:
01-28-2019
04:33 PM
Seems to be the same script which i mentioned above. Isn't it?
... View more
01-28-2019
03:50 PM
1 Kudo
@Marcel-Jan Krijgsman Do run the /usr/hdp/current/atlas-server/hook-bin/import-hive.sh utility which imports the existing hive tables into atlas.
... View more
01-22-2019
06:29 PM
Good one @Jagatheesh Ramakrishnan
... View more
09-12-2018
09:58 AM
Great one @Jonathan Sneep
... View more
08-30-2018
02:58 PM
@David Hoyle
The code structure has changed since this article was written. 1) checkout trunk 2) brew install protobuf250 (protobuf is needed to build hadoop) 3) Build using : mvn clean package -Phdds -Pdist -Dtar -DskipShade -DskipTests -Dmaven.javadoc.skip=true edit: updated the proto version
... View more
03-17-2018
05:58 AM
2 Kudos
Following are the steps to connect to Phoenix tables using Spark2. 1) Create a symlink of hbase-site.xml in spark2 conf ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml 2) Launch spark-shell using phoenix spark jars in extra classpath. spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" 3) Create a phoenix connection and query the tables. scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3
scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]
scala> df.show()
+-----+----------+----+
| ID| COL1|COL2|
+-----+----------+----+
|test1|test_row_1| 10|
|test2|test_row_2| 20|
+-----+----------+----+
Note: Spark2 and Phoenix integration is introduced from HDP 2.6.2.
... View more
Labels:
12-21-2016
07:54 PM
Very well explained @Rajkumar Singh !!
... View more