Member since
12-30-2015
68
Posts
16
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1161 | 06-14-2016 08:56 AM | |
1045 | 04-25-2016 11:59 PM | |
1266 | 03-25-2016 06:50 PM |
02-13-2017
11:28 PM
@Jasper Thanks for your comments. Could you please also let me know if this is the usual way Kafka consumers are run in hadoop? if not, could you let me know how the consumers/producers are usually scheduled in hadoop cluster?
... View more
02-09-2017
02:05 AM
Question on scheduling the Kafka Consumer client in hadoop cluster: I have coded a Kafka consumer client that reads the messages from a topic and writes to a local file. I want to schedule this consumer client so that it runs continuously and reads from the topic as and when the message is published in Hadoop cluster. Can someone please explain what is the standard way of doing this in hadoop cluster? I have following approach in my mind, but not sure if this is a usual way. Please let me know your thoughts or suggestions on this. (The sample client writes to a file in local filesystem, but thats just for testing when I schedule it, I am planning to write to HDFS file and then process it later; later after sometime I am planning to write to Hbase directly from Kafka consumer) I am thinking of creating a Oozie workflow with consumer client called using java action and submit the same workflow as many times as the number of consumers I want. I will also change the consumer to write to HDFS file instead of local file. (The HDFS filename will be appended with partition number so that two consumers dont try to write to same file). If I follow this approach, the kafka clients on run on Yarn right? So do I have to do something specific to consumer client rebalancing? or will that work properly as usual? I am just assigning topics to the consumer, not subscribing to specific partition in consumer. Please let me know. And generally do I have to code the Java client in any different way to run through oozie? The entire java client will be launched in a single mapper in my case correct?
... View more
Labels:
01-17-2017
08:13 PM
Could not paste both the explain plans in previous comment. Here is the explain plan by disabling hive.explain.user=false. hive> set hive.explain.user=false;
hive> explain select a.* from big_part a, small_np b where a.jdate = b.jdate;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: A515595_20170117140547_4494cba3-581e-441c-8fb6-8175b74d89c2:3
Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
filterExpr: jdate is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
Filter Operator
predicate: jdate is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 jdate (type: date)
1 jdate (type: date)
outputColumnNames: _col0, _col1, _col6
input vertices:
1 Map 2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
HybridGraceHashJoin: true
Filter Operator
predicate: (_col1 = _col6) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: date)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: b
filterExpr: jdate is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: jdate is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: jdate (type: date)
sort order: +
Map-reduce partition columns: jdate (type: date)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.428 seconds, Fetched: 68 row(s)
... View more
01-17-2017
08:10 PM
Thanks for your comments! Here is the explain plan and create table statements. Hive version is 0.14. And also for the 3rd answer, in case both are partitioned tables, is there anyway to ensure that bigger of the two tables undergo partition pruning instead of the small one? (or is this the default behavior?) What does hive.explain.user = false do? I have attached explain plan by both enabling and disabling this. > show create table big_part;
OK
CREATE TABLE `big_part`(
`id` int)
PARTITIONED BY (
`jdate` date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://littleredns/apps/hive/warehouse/big_part'
TBLPROPERTIES (
'transient_lastDdlTime'='1484615054')
Time taken: 1.749 seconds, Fetched: 14 row(s)
hive> show create table small_np;
OK
CREATE TABLE `small_np`(
`id2` int,
`jdate` date)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://littleredns/apps/hive/warehouse/small_np'
TBLPROPERTIES (
'transient_lastDdlTime'='1484615162')
Time taken: 0.16 seconds, Fetched: 13 row(s)
hive> set hive.optimize.ppd=true;
hive> set hive.tez.dynamic.partition.pruning=true;
hive> explain select a.* from big_part a, small_np b where a.jdate = b.jdate;
OK
Plan not optimized by CBO.
Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Map 1 vectorized
File Output Operator [FS_21]
compressed:false
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
Select Operator [OP_20]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator [FIL_19]
predicate:(_col1 = _col6) (type: boolean)
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Map Join Operator [MAPJOIN_18]
| condition map:[{"":"Inner Join 0 to 1"}]
| HybridGraceHashJoin:true
| keys:{"Map 2":"jdate (type: date)","Map 1":"jdate (type: date)"}
| outputColumnNames:["_col0","_col1","_col6"]
| Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
|<-Map 2 [BROADCAST_EDGE]
| Reduce Output Operator [RS_4]
| key expressions:jdate (type: date)
| Map-reduce partition columns:jdate (type: date)
| sort order:+
| Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
| Filter Operator [FIL_14]
| predicate:jdate is not null (type: boolean)
| Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
| TableScan [TS_1]
| alias:b
| Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
|<-Filter Operator [FIL_17]
predicate:jdate is not null (type: boolean)
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
TableScan [TS_0]
alias:a
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL
Time taken: 1.459 seconds, Fetched: 45 row(s)
... View more
01-17-2017
03:40 AM
Hi, Could someone please explain me understand the below questions on hive partition pruning and explain plan? 1. How to check if partition pruning occurs by checking the explain plan? I thought I would see "Dynamic partitioning event operator" in explain plan, but in my sample query below I am not seeing any such operator. I enabled hive.tez.dynamic.partition.pruning. -- Since the table does not have much data, it is going for map join, does that have anything to do with partition pruning not happening? explain select a.* from big_part a, small_np b where a.jdate = b.jdate ;
big_part is partitioned on jdate where small_np is a non partitioned table, even adding explicit filter on jdate like jdate = "2017-01-01" is not showing this operator in explain plan.
The tables are just in text formats. I tried disabling and enabling hive.optimize.ppd . But it just changed adding or removing a filter operator much higher in explain plan, but no difference besides that. Will the optimize.ppd parameter have any effect on partition pruning? 2. Is it correct to expect that dynamic partition pruning should happen on big_part table in the above query? 3. If both the tables used in join are partitioned, can we expect that dynamic partition pruning happens on both tables? 4. Will the dynamic partition pruning occur in case of outer joins too? (full and left outer assuming that inner table's conditions are given in "on condition" or outer table's conditions are given in "where clause"). 5. What exactly this hive.optimize.ppd do in case of text files? Just push the filter predicates when reading from table itself if possible? Thank you!
... View more
Labels:
- Labels:
-
Apache Hive
09-28-2016
07:00 PM
Thanks for the suggestion. I have not tried these parameters.. What are these parameters for? Are these the ones that help set the mapper memory size in pig?
... View more
09-27-2016
10:37 PM
I am running my pig scripts and Hive queries in tez mode. For all of these pig scripts/Hive queries the mapper memory requested was more than the memory used. So I changed the mapreduce.map.memory.mb to a lesser value and also changed the mapreduce.map.java.opts. Even after changing these values, my mapper memory requested is more than the map memory used, nothing seemed to changed in performance metrics. (This was from analyzing the job in Dr. elephant), but then the pig script also aborted now with below error message after changing these settings. "java.lang.IllegalArgumentException: tez.runtime.io.sort.mb 1638 should be larger than 0 and should be less than the available task memory (MB):786" I never gave 786 MB anywhere in my setting, where did it take this value from? And also, how do I configure the map and reduce memory in tez execution mode? (I see documentation for hive to set then hive.tez.container.size, but nothing for pig). And is it possible to configure the map and reduce memory differently in tez mode? since in hive on tez documentation it was just mentioned about the map memory setting nothing for reducer memory. And also since tez creates a dag of tasks, they are not like map reduce right, both map and reduce are just seen as an individual task in DAG? or are these DAG tasks still can be classified into mapper/reducer actions? Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Pig
-
Apache Tez
06-14-2016
08:56 AM
When I added the hive-site.xml first, I missed a few properties, now I added all properties mentioned by @allen huang in this link https://community.hortonworks.com/questions/25121/oozie-execute-sqoop-falls.html#answer-25291 So even if the sqoop is called using ooziee shell action, I had to add hive-site.xml with properties mentioned by Allen. Thank you Allen :). My script is working fine now.
... View more
06-14-2016
06:56 AM
Hi, I checked the logs. No information found as to why the script aborted. This is all is shown in the log. INFO hive.HiveImport: Loading uploaded data into Hive
WARN conf.HiveConf: HiveConf of name hive.metastore.pre-event.listeners does not exist
WARN conf.HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
Logging initialized using configuration in jar:file:/grid/8/hadoop/yarn/local/filecache/5470/hive-common-1.2.1.2.3.4.0-3485.jar!/hive-log4j.properties
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
... View more
06-10-2016
12:43 PM
Hi, I am running a oozie shell action to run a sqoop command to import data to Hive. When I run the sqoop command directly, it works fine, but when I run it through oozie shell action, it aborts with Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] Based on this link, https://community.hortonworks.com/questions/25121/oozie-execute-sqoop-falls.html#answer-25290 I have added hive-site.xml also using <file> tag in oozie shell action and also based on other link I have added export HIVE_CONF_DIR=`pwd` before running the sqoop command. But nothing worked. When I add full hive-site.xml it resulted in the same error above, when I added just the important properties mentioned in this link http://ingest.tips/2014/11/27/how-to-oozie-sqoop-hive/, I get this error FAILED: IllegalStateException Unxpected Exception thrown: Unable to fetch table XYZ. java.net.SocketException: Connection resetFailing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] Both the times, the sqoop command successfully creates the file in target-directory but fails while loading this data to hive. Hadoop cluster is kerberos enabled. I have a kinit done before submitting the workflow and also kinit is done again inside the oozie shell action. Can someone please throw some light on how to fix this one? below is the sqoop command used. Sqoop command:sqoop import \
--connect "jdbc:teradata://${server}/database=${db},logmech=ldap" \
--driver "com.teradata.jdbc.TeraDriver" \
--table "XYZ" \
--split-by "col1" \
--hive-import \
--delete-target-dir \
--target-dir "/user/test/" \
--hive-table "default.XYZ" \
--username "terauser" \
--password tdpwd \
--where "${CONDITION}" \
--m 2 \
--fetch-size 1000 \
--hive-drop-import-delims \
--fields-terminated-by '\001' \
--lines-terminated-by '\n' \
--null-string '\\N' \
--null-non-string '\\N'
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Oozie
-
Apache Sqoop
06-10-2016
07:53 AM
Thanks, I was able to set up the SSH and it is working. But I have a question. What is this oozie ID? I am logging in to Linux using my ID. I submit the workflow either using my ID or I do a kinit to other ID and submit the workflow using that ID. When I see in the UI logs, oozie workflow is shown to be submitted using either my ID or other ID for which ticket was obtained using Kinit. What I dont understand is where does this "oozie" user id fit in. Even I had to go to the home directory of this user "oozie" and get its keys and added it to my destination server authprized_keys file. Can you please explain the purpose of this id? And also how to find this "oozie user id" Is this user id available in oozie-site.xml? Based on this article I also searched for oozie id, but what if it were different? how to find this oozie id? Thanks!
... View more
06-07-2016
06:07 PM
Hi, I need to execute a shell script in a remote server from hadoop server. So I am planning to use oozie SSH action for this. I have 2 basic questions regarding oozie actions. 1. for passwordless SSH, I need to share the public keys between 2 servers. In case of oozie SSH action, where does the oozie workflow initiate the SSH action from? Does it execute from any of the data nodes? If so, how to setup the SSH or get the public keys. 2. Does the oozie Shell action execute from any of the available data nodes or is there any specific way this execution server is identified? Thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Oozie
05-13-2016
05:59 PM
1 Kudo
I created a relation like A = Load from 'soemtable' using org.apache.hive.hcatalog.pig.HCatLoader();
A_Filtered = FILTER A BY Col1 == 1 ; Now I changed the underlying table structure in Hive and so get the proper data type reflected in relation "A" I ran the same command again to create A. I changed the column type of Col1 from Int to String. A = Load from 'soemtable' using org.apache.hive.hcatalog.pig.HCatLoader(); Now when I created this relation, I understand that A_Filtered becomes invalid. So I got below error when I tried to create A again. But I am unable to even create the same A_Filtered relation again or any relation at all with new definition, I am getting this error at all time that the relation A_Filtered has different data types in equal operator. How to fix this? Can I delete the relation so that it does not occur? This will be cleared once I exit grunt shell and login again. But I wanted to know how to fix this without exiting the shell, is it possible? 2016-05-13 12:32:02,354 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-05-13 12:32:02,403 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.pre-event.listeners does not exist
2016-05-13 12:32:02,403 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.semantic.analyzer.factory.impl does not exist
2016-05-13 12:32:02,425 [main] INFO hive.metastore - Connected to metastore.
2016-05-13 12:32:02,519 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-05-13 12:32:02,553 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-05-13 12:32:02,603 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.pre-event.listeners does not exist
2016-05-13 12:32:02,604 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.semantic.analyzer.factory.impl does not exist
2016-05-13 12:32:02,666 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1039:
<line 2, column 56> In alias A_Filtered, incompatible types in Equal Operator left hand side:chararray right hand side:int
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Hive
-
Apache Pig
05-13-2016
03:58 AM
And one more question, I am actually loading the Hbase table through pig, where I am generating the sequence number before loading this table using Rank. Even though I pre split the table, when I looked at the UI, it showed that only three regions of the Hbase table was getting loaded initially for first few minutes. Why would this happen? When we pre-split the table, shouldn't all the regions of the Hbase table get its data from previous operator simultaneously and get loaded in parallel? Why is it that only three of the 10 regions were loading? Could it be because it was streaming the data as and when the previous operator generator row number and passed it down to Storer? Any suggestions would be helpful. Thanks!
... View more
05-13-2016
03:51 AM
Actually the main purpose of the table is to be a lookup table. Major problem is that the lookup with this table is based on the Sequence number so I chose sequence number as the row key and that is the reason I think this caused hot spots. May I know how the phoenix ensures a good distribution? Because I thought phoenix was just a SQL layer on top of Hbase to enable query on Hbase tables.
... View more
05-13-2016
03:47 AM
Thanks for the info.
... View more
05-11-2016
09:58 PM
Hi Ben, Thanks for taking time to explain each of these questions. For qn. 4, I actually meant to type Vertex, but instead I mentioned it as node. What I meant to ask was, by setting the number of reducers, we affect all the vertices that run only the reducers. Based on your explanation. I think it affects all the vertices running reducers. And it would not affect any vertex running mappers or combination of mappers and combiners. Right? By mapper and reducer, my understanding was that any class extending Mapper is mapper and any class extending Reducer is reducer. Just out of curiosity when I look into the pig source code, there are many operators like PORank, POFRJoin etc.. and these are the ones that are showed in explain plan also as tasks of each vertex. So essentially in Tez DAG pig latin gets converted to these operators right? Are these operators run as part of Mapper and reducers? So irrespective of the underlying task being a true mapper or reducer class or one of the tez pig operators, is it correct to assume that that the parallelism of root vertices which read the data from file or table to be controlled based on file split or table parttions and the leaf vertices and other vertices in between are all like reducers and its parallelism is controlled by reducers properties? like number of reducers or bytes per reducers? And if I write a UDF, is it possible to identify if it is run inside mapper class or reducer class?
... View more
05-10-2016
06:18 PM
Hi Artem, Thanks for your reply. I will try hashing my row-key. But even then I think I might need to pre-split the Hbase table so that there are many regions even in the beginning of the load. (I was just reading about creating the pre splits). But even then Hbase might end up creating all the regions in same server right? Is there any option to ensure that all the regions of same table are distributed across multiple servers?
... View more
05-10-2016
02:36 AM
Hi Josh, I saw the table split as you suggested. I see that the table has 18 regions. But the problem is all 18 regions of the table is in same node. (or region server). How do I split the regions across multiple region servers? And also is there a Command to check the table splits and Hbase regions via CLI? Any parameter I can use to improve the load performance using pig? currently I am using only hbase.scanner.caching to reduce roundtrips. Thanks for your help!
... View more
05-09-2016
03:00 AM
1 Kudo
I am trying to load a HBase table using Pig from HDFS file. The file is just 3 GB with 30350496 records. It takes a long time to load the table. Pig is running in tez. Can you please suggest me any ways to improve the performance? How to identify where the performance bottle neck is? I am not able to get much from the pig explain. Any ways to identify if single Hbase region server is overloaded or if it is getting distributed properly. How to identify Hbase regionserver splits ?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Pig
-
Apache Tez
05-09-2016
02:54 AM
1 Kudo
I have couple of basic questions on tez dag task and map reduce jobs and pig running on tez.
1. My understanding is that each vertex of the tez task can run either mapper or reducer and not both, is this right? 2. Each vertex of tez task can have its own parallelism. Is this right? Say if there are three vertices, one map and two reducer vertices,
then can each reducer vertices run with different parallelism? How to control this? 3. When I see pig explain plan running on tez, I see vertices and operations on them, but I dont see the parallelism for each vertex in the explain plan, I see it only
when I dump the relation. How to see the parallelism of each vertex in explain plan of pig? 4. If I use parallel clause to control the number of reducers in the pig in tez mode, does it control the parallelism of the vertices running only the reducers?
and does it affect all nodes runinng the reducers? Is there a way to control the number of parallelism of each vertex separately? 5. If there are 4 splits of file, then there would ideally be 4 mappers right? In this case, in tez would there be 4 vertices each running one mapper or one vertex
running 4 mappers?
6. How to control the number of mappers (or the parallelism of vertex running the mappers)? 7. When the pig command is running, I see the number of total tasks, but how to find the number of tasks in each vertex?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
-
Apache Tez
04-26-2016
05:47 AM
Hi, Thanks for your suggestion. I have just started using this so can you please help me in understanding few more things. I found the webhcat server using ambari and the templeton.libjars value in webhcat.-site.xml are as follows <name>templeton.libjars</name>
<value>/usr/hdp/${hdp.version}/zookeeper,/usr/hdp/${hdp.version}/hive/lib/hive-common.jar/zookeeper.jar</value> I think this has wrong values or typo in it as you suggested. I dont have access to edit this file. 1. Is there any other way to use webhcat without editing the webhcat-site.xml file? Like passing as a post parameter in Curl or something? 2. My cluster has edge node in it. Why is it that the webhcat-site.xml file is present only in webhcat server. Why is it not present in edge node. Edge node only has webhcat-default.xml? Shouldn't all the *-site.xml be present in edge node as well? 3. How to access hiveserver2 via knox? Is it possible to use hiveserver2 to insert values into hive table from outside the cluster using knox?
... View more
04-26-2016
12:08 AM
Hi I am trying to execute a query in Hive through WebHcat services of Knox using below (Kerberos enabled) curl -i -u testuser:abcdef -d execute="select+*+from+test_table;" \
-d statusdir="pokes.output" \
'https://knox.testserver.com:8443/gateway/sample/templeton/v1/hive?user.name=testuser'
But this does not work, This fails with following message HTTP/1.1 500 Server Error
Set-Cookie: JSESSIONID=18unyi37n3omieug23ruoetn0;Path=/gateway/sample;Secure;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Server: Jetty(7.6.0.v20120127)
Content-Type: application/json
Content-Length: 93
{"error":"File /user/hadoop/2.3.4.0-3485/hive/lib/hive-common.jar/zookeeper.jar does not exist."} in the first place.. instead of hive-common.jar I have hive-common-1.2.1.2.3.4.0-3485.jar and this does not include zookeeper.jar in it. Any idea how to solve this?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Knox
04-25-2016
11:59 PM
Hi, Thanks for your reply. I tried to query the same table today. And I was able to query it. Not sure why it was an issue earlier. I queried using the same ID again and it worked. Earlier it was an issue only from Tableau. I was able to query the hive external table and also Hbase table directly using command line without issues. Now this works fine.
... View more
04-22-2016
07:29 PM
1 Kudo
I am using Tableau V8.2. I am able to connect to Hive from Tableau and query Hive managed tables. But When I try to query the Hive external table with actual table in Hbase, I am getting permission denied error when trying to select data. But when I drag and drop the external table to query window, I am able to get the external table metadata without issues. Its just that the read is not working even though I have all the permissions. Mine is kerberized cluster. Does this have something to do with this? Any idea how to access this? Note: I created a ODBC driver connection and TDC file for tableau connectivity and connected to Hive using "Other ODBC sources" from Tableau. I was unable to connect using Hortonworks Hive Driver.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
03-25-2016
06:50 PM
Hi I just tried again the next day. I didnt change anything. Same variables for hadoop home hive home, hcat home and Hcat lib jars. Today it worked with both 1.3 and 1.4 TDCH.. Not sure why it didnt work the first time. Thanks for your time and help!
... View more
03-23-2016
10:45 PM
2 Kudos
Hi, I am using TDCH 1.3 to export data from Hive to Teradata. I added HADOOP_CLASSPATH and HIve home, Hcat home and Sqoop home and HCAT_LIB_JARS and passed this in TDCH command. export HCAT_LIB_JARS=${HCAT_HOME}/share/hcatalog/hive-hcatalog-core.jar,$HIVE_HOME/lib/hive-cli.jar,$HIVE_HOME/lib/hive-exec.jar,$HIVE_HOME/lib/hive-metastore.jar,$HIVE_HOME/lib/libfb303-0.9.2.jar,$HIVE_HOME/lib/libthrift-0.9.2.jar,$HIVE_HOME/lib/jdo-api-3.0.1.jar I get the following error. I added all the jars I found the HcatInputFormat class also (the one that is said as missing) in the hive-hcatalog jars ${HCAT_HOME}/share/hcatalog/hive-hcatalog-core.jar. But the path is slightly different in this. org/apache/hive/hcatalog/mapreduce/HCatInputFormat.class --> highlighted the difference.
But this class is reported as missing when it is in different path. How to fix this? 16/03/23 17:33:41 INFO tool.ConnectorExportTool: java.lang.NoClassDefFoundError: org/apache/hcatalog/mapreduce/HCatInputFormat
at com.teradata.connector.hcat.processor.HCatInputProcessor.inputPreProcessor(HCatInputProcessor.java:41)
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Hive
03-09-2016
09:03 PM
1 Kudo
I just logged into Ambari and found the hiveserver2 under summary tab. There are five hiveserver2 processes listed there. And each of them are running in different hosts. Why would there be so many hiveserver2 components? Isnt hiveserver2 a server process for processing client requests from cli? so shouldnt there be only one hiveserver2? Thanks!
... View more
03-09-2016
08:33 PM
1 Kudo
Sorry, I just started learning hadoop. Not sure where you want me look for this summary. I checked from hue. I could not see this detail. Can you please elaborate a little?
... View more
03-08-2016
10:54 PM
1 Kudo
I found the port information under the property hive.server2.thrift.port. But the only properties where server names are listed are under hive.zookeeper.quorum hive.metastore.uris. Can you please let me know which property I should be looking at to find the host that runs hiveserver2 service?
... View more