Member since
08-08-2018
49
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14000 | 08-11-2018 12:05 AM | |
10800 | 08-10-2018 07:29 PM |
10-08-2018
04:32 PM
@wxu What should I be using to write my Hive scrips in Ambari today?
... View more
09-08-2018
08:16 PM
Hello, One normally disables Tez with Hive using: SET hive.execution.engine=mr;
But when I use this option in the Hive shell I get: 0: jdbc:hive2://my_server:2181,> SET hive.execution.engine = mr;
Error: Error while processing statement: hive execution engine mr is not supported. (state=42000,code=1)
What's going on? Tez is not working for me and I want to try with MR. I'm using HDP 3.0
... View more
Labels:
09-05-2018
04:23 AM
I have a 4-node cluster and this did not work for me. same error: /bucket_00003 could only be written to 0 of the 1 minReplication nodes. There are 4 datanode(s) running and no node(s) are excluded in this operation.
... View more
08-22-2018
05:15 PM
Can please let me know why my Hive query is taking such a long time? The query: SELECT DISTINCT res_falg as n FROM my_table took ~75 mins to complete The query: SELECT COUNT(*) FROM my_table WHERE res_flag = 1; took 73 minutes to complete The table is stored on HDFS (replicated on 4 nodes) as a CSV and is only 100MB in size. It has 6 columns, the types are VARCHAR or TINYINT. The column I'm querying has NAs. It is an external table. My hive query is running as a Tez job on YARN using 26 cores and ~120 GB of memory. I am not using LLAP. Any idea what's going on? I'm on HDP 3.0. EDIT: I imported the csv into hdfs using the following command: hdfs dfs -Ddfs.replication=4 -put '/mounted/path/to/file.csv' /dir/file.csv
I used these commands to create the table in Hive: CREATE EXTERNAL TABLE my_table (
svcpt_id VARCHAR(50),
start_date VARCHAR(8),
end_date VARCHAR(8),
prem_id VARCHAR(20),
res_flag TINYINT,
num_prem_id TINYINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
LOCATION 'hdfs://ncienspk01/APS';
<br>LOAD DATA INPATH '/dir/file.csv' into table my_table;
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
08-16-2018
08:32 PM
@Sandeep Nemuri Thanks for the pointers. I finally got it working. For others running Spark-Phoenix in Zeppelin you need to:
On the spark client node, create a symbolic link of 'hbase-site.xml' into the spark /conf file: ln -s /usr/hdp/current/hbase-master/conf/hbase-site.xml /usr/hdp/current/spark2-client/conf/hbase-site.xml Add the following to both spark.driver.extraClassPath and spark.executor.extraClassPath in spark-defaults.conf: /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar
Add the following jars in Zeppelin under the Spark2 interpreter's dependencies /usr/hdp/current/phoenix-server/lib/phoenix-spark-5.0.0.3.0.0.0-1634.jar /usr/hdp/current/hbase-master/lib/hbase-common-2.0.0.3.0.0.0-1634.jar /usr/hdp/current/hbase-client/lib/hbase-client-2.0.0.3.0.0.0-1634.jar /usr/hdp/current/hbase-client/lib/htrace-core-3.2.0-incubating.jar /usr/hdp/current/phoenix-client/phoenix-client.jar /usr/hdp/current/phoenix-client/phoenix-server.jar
... View more
08-16-2018
08:24 PM
you're the best 🙂
... View more
08-16-2018
05:03 PM
@Sandeep Nemuri I edited my above question, do you mind taking a look at it? I'm seeing a CsvBulkLoadTool and a JsonBulkLoadTool. How will I bulk load my sqoop-loaded data?
... View more
08-16-2018
04:42 PM
@Sandeep Nemuri I'm not sure if I follow what you are talking about. The page you pointed to shows a bulk load from CSV>Phoenix or HDFS JSON>Phoenix. Can you provide a link or command on how one would go from Sqoop's HDFS output to Phoenix directly?
... View more
08-16-2018
03:41 PM
How about this one? com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: com/lmax/disruptor/EventFactory
... View more
08-16-2018
02:55 PM
@Sandeep Nemuri Thanks so much for the advice! My table occupies 1.5 terabytes in MS SQL server. This will be a one-time migration. I feel that exporting to a csv would take days of processing time. Your last approach sounds the best but I've heard that it is not possible based on this post and this post (my table does have a float column). That being said, what is my best option? I'm thinking that I may try to split up my table into n csv files and load them sequentially into Phoenix. Would that be the best option for data at this size?
... View more