About daniel_zafar

daniel_zafar · ‎10-08-2018

@wxu What should I be using to write my Hive scrips in Ambari today?

daniel_zafar · ‎09-08-2018

Hello, One normally disables Tez with Hive using: SET hive.execution.engine=mr; But when I use this option in the Hive shell I get: 0: jdbc:hive2://my_server:2181,> SET hive.execution.engine = mr; Error: Error while processing statement: hive execution engine mr is not supported. (state=42000,code=1) What's going on? Tez is not working for me and I want to try with MR. I'm using HDP 3.0

daniel_zafar · ‎09-05-2018

I have a 4-node cluster and this did not work for me. same error: /bucket_00003 could only be written to 0 of the 1 minReplication nodes. There are 4 datanode(s) running and no node(s) are excluded in this operation.

daniel_zafar · ‎08-22-2018

Can please let me know why my Hive query is taking such a long time? The query: SELECT DISTINCT res_falg as n FROM my_table took ~75 mins to complete The query: SELECT COUNT(*) FROM my_table WHERE res_flag = 1; took 73 minutes to complete The table is stored on HDFS (replicated on 4 nodes) as a CSV and is only 100MB in size. It has 6 columns, the types are VARCHAR or TINYINT. The column I'm querying has NAs. It is an external table. My hive query is running as a Tez job on YARN using 26 cores and ~120 GB of memory. I am not using LLAP. Any idea what's going on? I'm on HDP 3.0. EDIT: I imported the csv into hdfs using the following command: hdfs dfs -Ddfs.replication=4 -put '/mounted/path/to/file.csv' /dir/file.csv I used these commands to create the table in Hive: CREATE EXTERNAL TABLE my_table ( svcpt_id VARCHAR(50), start_date VARCHAR(8), end_date VARCHAR(8), prem_id VARCHAR(20), res_flag TINYINT, num_prem_id TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile LOCATION 'hdfs://ncienspk01/APS'; <br>LOAD DATA INPATH '/dir/file.csv' into table my_table;

daniel_zafar · ‎08-16-2018

@Sandeep Nemuri Thanks for the pointers. I finally got it working. For others running Spark-Phoenix in Zeppelin you need to: On the spark client node, create a symbolic link of 'hbase-site.xml' into the spark /conf file: ln -s /usr/hdp/current/hbase-master/conf/hbase-site.xml /usr/hdp/current/spark2-client/conf/hbase-site.xml Add the following to both spark.driver.extraClassPath and spark.executor.extraClassPath in spark-defaults.conf: /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar Add the following jars in Zeppelin under the Spark2 interpreter's dependencies /usr/hdp/current/phoenix-server/lib/phoenix-spark-5.0.0.3.0.0.0-1634.jar /usr/hdp/current/hbase-master/lib/hbase-common-2.0.0.3.0.0.0-1634.jar /usr/hdp/current/hbase-client/lib/hbase-client-2.0.0.3.0.0.0-1634.jar /usr/hdp/current/hbase-client/lib/htrace-core-3.2.0-incubating.jar /usr/hdp/current/phoenix-client/phoenix-client.jar /usr/hdp/current/phoenix-client/phoenix-server.jar

daniel_zafar · ‎08-16-2018

you're the best 🙂

daniel_zafar · ‎08-16-2018

@Sandeep Nemuri I edited my above question, do you mind taking a look at it? I'm seeing a CsvBulkLoadTool and a JsonBulkLoadTool. How will I bulk load my sqoop-loaded data?

daniel_zafar · ‎08-16-2018

@Sandeep Nemuri I'm not sure if I follow what you are talking about. The page you pointed to shows a bulk load from CSV>Phoenix or HDFS JSON>Phoenix. Can you provide a link or command on how one would go from Sqoop's HDFS output to Phoenix directly?

daniel_zafar · ‎08-16-2018

How about this one? com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: com/lmax/disruptor/EventFactory

daniel_zafar · ‎08-16-2018

@Sandeep Nemuri Thanks so much for the advice! My table occupies 1.5 terabytes in MS SQL server. This will be a one-time migration. I feel that exporting to a csv would take days of processing time. Your last approach sounds the best but I've heard that it is not possible based on this post and this post (my table does have a float column). That being said, what is my best option? I'm thinking that I may try to split up my table into n csv files and load them sequentially into Phoenix. Would that be the best option for data at this size?

Online	Offline
Last Visited	‎03-21-2019 06:40 PM

Member Since	‎08-08-2018 10:30 PM
Last Visited	‎03-21-2019 06:40 PM
Posts	49
Kudos received	2

Cloudera Community

Re: Nodemanager bad health and connection refused

Re: Nodemanager bad health and connection refused

Re: Where is Hive View on HDP 3 ?

Cannot Disable Tez with Hive on HDP3.0

Re: Write or Append failures in very small Cluster...

Simple hive query on small dataset taking over an ...

Re: Spark2 Phoenix Plugin with Zeppelin

Re: Managing jars while connect Phoenix to Zeppeli...

Re: Best practice to import data into HBase/Phoeni...

Re: Best practice to import data into HBase/Phoeni...

Re: Managing jars while connect Phoenix to Zeppeli...

Re: Best practice to import data into HBase/Phoeni...