Member since
05-09-2016
280
Posts
58
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2051 | 03-28-2018 02:12 PM | |
2157 | 01-09-2018 09:05 PM | |
794 | 12-13-2016 05:07 AM | |
2546 | 12-12-2016 02:57 AM | |
1833 | 12-08-2016 07:08 PM |
05-11-2018
04:59 PM
Hi guys, I am using Spark 1.6.3 and trying to create a pysaprk dataframe from Hive ORC partitioned table. I tried: sqlContext.read.format('orc').load('tablename') but it looks like that load only accepts filename in HDFS. Filename is dynamic and we do not track it during runtime. What would be the best way to handle this? Is it supported in Spark2.0? Thank you so much
... View more
Labels:
- Labels:
-
Apache Hive
03-28-2018
02:12 PM
Hi @Scott Shaw, thanks for that answer. I will take a look. I realized this can be achieved by Atlas. The metadata changes would be picked up by HiveHook and it will send to ATLAS_HOOK topic of Kafka. I am working on comparing two options to consume this JSON message from the topic: 1) Connect with Nifi to filter that JSON and use PutEmail processor for notification 2) Write a custom Java Kafka Consumer that does the same thing as above. Please let me know how you feel.
... View more
03-27-2018
05:37 PM
Thank you @Constantin Stanca so much. This was very much needed. Can you please let me know the Nifi version compatible with HDP 2.6. I have HDP cluster already installed on google cloud, cannot install a separate HDF cluster. Is there a standalone jar of Nifi that can work in HDP cluster?
... View more
03-26-2018
07:12 PM
1 Kudo
Hi everyone,
I am using HDP 2.6 and I want to track Hive tables metadata changes in real time. I have HiveHook enabled and I can see Kafka JSON messages in ATLAS_HOOK and ATLAS_ENTITIES topics.
Also, Atlas is able to consume these entity updates. I am looking for the most optimal way to get entity updates info real time. 1) Is there a way to create a NotificationServer (like SMTP) to which Atlas will send these updates? 2) Or do I have to create a custom Kafka consumer that reads data directly from ATLAS_HOOK or ATLAS_ENTITIES topics in JSON?
P.S - I do not want to read everything from Kafka topic. There are thousands of tables but I want metadata changes for specific tables only.
Please let me know how to setup the offsets related to particular databases/tables only. Thanks
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Hive
-
Apache Kafka
03-19-2018
08:39 PM
1 Kudo
Hi guys, I am using hdp version 2.6.1.40-4 on our dev servers. The hive version is 1.2.1. We use Hive tables as the source to our framework in which we read different columns from different tables and then we run some spark jobs to do processing. We maintain the config table in Hive in which we specify what columns we want from a source table. If someone changes the column name/add some new columns in their source table, we have to maintain this config table manually. Please throw some ideas on what are the different exciting ways to monitor/track real-time what is happening in Hive metastore and what could be the most suitable push notification mechanism to alert us in any form? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
01-09-2018
09:05 PM
Solved it. It was missing values for the RowKey as pointed out by the error: org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at I created the Row object which included all dataframe columns and then it worked.
... View more
01-08-2018
11:01 PM
Hi Guys, I am using Spark 1.6.3 and HBase is 1.1.2 on hdp2.6. I have to use Spark 1.6, cannot go to Spark 2. The connector jar is shc-1.0.0-1.6-s_2.10.jar. I am writing to hbase table from the pyspark dataframe: cat = json.dumps({"table":{"namespace":"dsc", "name":"table1", "tableCoder":"PrimitiveType"},"rowkey":"key","columns": {"individual_id":{"cf":"rowkey", "col":"key", "type":"string"}, "model_id":{"cf":"cf1", "col":"model_id", "type":"string"}, "individual_id":{"cf":"cf1", "col":"individual_id", "type":"string"}, "individual_id_proxy":{"cf":"cf1", "col":"individual_id_proxy", "type":"string"}}})
df.write.option("catalog",cat).format("org.apache.spark.sql.execution.datasources.hbase").save()
The error is: An error occurred while calling o202.save.
: java.lang.UnsupportedOperationException: empty.tail
at scala.collection.TraversableLike$class.tail(TraversableLike.scala:445)
at scala.collection.mutable.ArraySeq.scala$collection$IndexedSeqOptimized$super$tail(ArraySeq.scala:45)
at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:123)
at scala.collection.mutable.ArraySeq.tail(ArraySeq.scala:45)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:163)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745) Please let me know if anyone has come across this.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
01-08-2018
07:33 PM
Just got this done, I created a RDD of dictionary instead of string and it worked: json_rdd=sc.parallelize([event_dict]) Is there any way to maintain the same insertion order of columns in RDD as of dictionary?
... View more
01-06-2018
12:06 AM
Hi Guys, I want to create a Spark dataframe from the python dictionary which will be further inserted into Hive table. I have a dictionary like this: event_dict={"event_ID": "MO1_B", "event_Name": "Model Consumption", "event_Type": "Begin"} I tried creating a RDD and used hiveContext.read.json(rdd) to create a dataframe but that is having one character at a time in rows: import json
json_rdd=sc.parallelize(json.dumps(event_dict))
event_df=hive.read.json(json_rdd)
event_df.show() The output of the dataframe having a single column is something like this: {
"
e I also tried hiveContext.createDataFrame(event_dict) but it gave me the same output. Can you please suggest some trick to do this? I want to avoid creating a JSON file on local/HDFS and reading from it. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
11-01-2017
07:04 PM
Thanks @Ambud Sharma for your reply. The actual data will not be processed or sent to Kafka brokers. It would be there in Hive only. I want some specific information for the event like: 1. when did the job start 2. did it get fail due to some exception 3. if not, then how many records got processed Want to send this data to Kafka. I could use simple log4j logging along with Splunk but would like to stay within HDP stack. Please let me know how you feel about this.
... View more
10-31-2017
05:52 PM
Hi @Andy Liang, please let me know if you were able to solve this issue. I am facing the same thing. Do you have to change the java package? Thanks in advance
... View more
10-25-2017
05:03 PM
Hi guys, Wanted to know how good Kafka is to capture events at various states during the internal data processing. That info can be used for auditing or reporting purpose. Suppose the data consumption has been started and I want to know the number of input records processed and the number of records loaded in Hive. In Hive, there is some kind of enrichment going on. I want to know how many records got enriched. Plan is to load them eventually into HBase. Also, message volumes would be very low at this point of time. Just want to decouple these tasks from other framework jobs. Please let me know if you have ever come across the idea of using pub/sub in this kind of scenario.
... View more
Labels:
- Labels:
-
Apache Kafka
10-19-2017
08:40 PM
Thanks @Timothy Spann for your answer. These links are really helpful. I used python for Spark MLlib so will use the same for H2O as well.
... View more
10-19-2017
04:26 PM
Hi experts, Just curious to know about the differences between Spark MLlib/ML and H2O in terms of implementation of algorithms, performance and usability and which one is better in what kinds of use-cases? Thanks a lot in advance.
... View more
Labels:
- Labels:
-
Apache Spark
06-19-2017
11:24 PM
I am installing Hadoop 2.7.3.2.6.1.0-129 on my Amazon EC2 instances of Ubuntu 16.04.2 LTS. This is a fresh install, getting the following exception while installing HDFS client. This is stderr message Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 78, in <module>
HdfsClient().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 38, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 693, in install_packages
retry_count=agent_stack_retry_count)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 54, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/apt.py", line 53, in wrapper
return function_to_decorate(self, name, *args[2:])
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/apt.py", line 81, in install_package
self.checked_call_with_retries(cmd, sudo=True, env=INSTALL_CMD_ENV, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 86, in checked_call_with_retries
return self._call_with_retries(cmd, is_checked=True, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 100, in _call_with_retries
should_stop_retries = self._handle_retries(cmd, ex.code, ex.out, is_first_time, is_last_time)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 115, in _handle_retries
self._update_repo_metadata_after_bad_try(cmd, code, out)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 134, in _update_repo_metadata_after_bad_try
Logger.info("Execution of '%s' returned %d. %s" % (shell.string_cmd_from_args_list(cmd), code, out))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2600: ordinal not in range(128)
... View more
Labels:
04-25-2017
01:35 AM
Hi Guys, I am using latest version of Pig(0.16.0) and Tez version is 0.8.5. The pig script is running fine on MapReduce, but not with Tez. Tez with MapReduce and Hive(2.1.1) is working fine. Can this be a version mismatch problem? Please have a look at the logs: Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias joined
at org.apache.pig.PigServer.openIterator(PigServer.java:1019)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:747)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:564)
at org.apache.pig.Main.main(Main.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias joined
at org.apache.pig.PigServer.storeEx(PigServer.java:1122)
at org.apache.pig.PigServer.store(PigServer.java:1081)
at org.apache.pig.PigServer.openIterator(PigServer.java:994)
... 13 more
Caused by: org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:137)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:78)
at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:198)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:308)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1474)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1459)
at org.apache.pig.PigServer.storeEx(PigServer.java:1118)
... 15 more
Caused by: java.lang.NoSuchMethodException: org.apache.tez.dag.api.DAG.setCallerContext(org.apache.tez.client.CallerContext)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:128)
... 21 more
================================================================================
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
-
Apache Tez
04-22-2017
10:25 PM
Got it right. Actually whenever I was starting my hive shell, I was getting this warning: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. So I installed Tez(version 0.8.5) and changed the execution engine of Hive to Tez. Now all Hive queries that involve MapReduce job are running. My hive version is 2.1.1, that I guess do not work with MapReduce As per the regex, thanks alot @Ed Berezitsky,
Those regex worked.
... View more
04-22-2017
07:19 PM
Just saw that any query in Hive which involves map reduce job is giving the same exception.
... View more
04-22-2017
07:17 PM
Thanks for the response. I am still getting the same exception while doing regexp_extract.
... View more
04-20-2017
06:11 AM
@gnovak
Thanks a lot, I guess I missed that point. That has to be the reason why there is nothing in the output.
... View more
04-19-2017
11:52 AM
I have also added these two lines in core-site.xml but still it didn't work: <property> <name>hadoop.proxyuser.hive.hosts</name><value>*</value>
</property> <property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
... View more
04-18-2017
07:49 PM
It is working finally. Just realized that there is no hive user or group in my system so changed those two properties in core-site.xml to hadoop.proxyuser.mrizvi.groups and hadoop.proxyuser.mrizvi.hosts I know this does not look like an ideal settings but for pseudo mode's sake, this would be fine for now. In fully distributed cluster, I guess we have to add all users in hive group and change core-site.xml accordingly.
... View more
04-18-2017
05:57 PM
Hi guys, I am using a pseudo mode machine and want to use beeline. I have started hiveserver2 by hiveserver2 start --hiveconf hive.root.logger=DEBUG,console Everything looks fine, even Hiveserver HTTP starts on 10002. Now when I am trying to connect from beeline shell, debug logs return this: WARN [HiveServer2-Handler-Pool: Thread-40] thrift.ThriftCLIService: Error opening session:
org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: mrizvi is not allowed to impersonate anonymous
I have disabled impersonation. My hive-site.xml :- <configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value></property>
<property><name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://ubuntu:3306/metastore_db?createDatabaseIfNotExist=true</value></property>
<property><name>javax.jdo.option.ConnectionUserName</name>
<value>root</value></property>
<property><name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value></property>
<property><name>hive.server2.thrift.port</name>
<value>10001</value></property>
<property><name>hive.server2.thrift.bind.host</name>
<value>ubuntu</value></property>
<property><name>hive.server2.enable.doAs=false</name>
<value>false</value></property>
<property><name>hive.server2.enable.impersonation</name>
<value>false</value></property>
</configuration>
... View more
- Tags:
- beeline
- Data Processing
- hiveserver2
- jdbc
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hive
04-18-2017
01:50 AM
Hi guys, I have an input file which looks like: 1:Washington Berry Juice 1356:Carrington Frozen Corn-41 446:Red Wing Plastic Knives-39 1133:Tri-State Almonds-41 1252:Skinner Strawberry Drink-39 868:Nationeel Raspberry Fruit Roll-39 360:Carlson Low Fat String Cheese-38
2:Washington Mango Drink 233:Best Choice Avocado Dip-61 1388:Sunset Paper Plates-63 878:Thresher Semi-Sweet Chocolate Bar-63 529:Fast BBQ Potato Chips-62 382:Moms Roasted Chicken-631 191:Musial Tasty Candy Bar-62 This is the output from user recommendation engine. The first pair is the main product ID and name. Next 6 are ProductId:Name:Count and all the 6 products are delimited by tab. I want to load this data in a Hive table. As you can see here, there are multi delimeters, so I created a temporary table first having only one string column and then inserted this file. Next, i created a final table having the correct attributes and data types. Now when I am inserting the data using regular expression by running the query: insert overwrite table recommendation SELECT
regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) productId,
regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) productName,
regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) productId1,
regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) productName1,
regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) productCount1,
regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) productId2,
regexp_extract(col_value, '^(?:([^,]*),?){7}', 1) productName2,
regexp_extract(col_value, '^(?:([^,]*),?){8}', 1) productCount2,
regexp_extract(col_value, '^(?:([^,]*),?){9}', 1) productId3,
regexp_extract(col_value, '^(?:([^,]*),?){10}', 1) productName3,
regexp_extract(col_value, '^(?:([^,]*),?){11}', 1) productCount3,
regexp_extract(col_value, '^(?:([^,]*),?){12}', 1) productId4,
regexp_extract(col_value, '^(?:([^,]*),?){13}', 1) productName4,
regexp_extract(col_value, '^(?:([^,]*),?){14}', 1) productCount4,
regexp_extract(col_value, '^(?:([^,]*),?){15}', 1) productId5,
regexp_extract(col_value, '^(?:([^,]*),?){16}', 1) productName5,
regexp_extract(col_value, '^(?:([^,]*),?){17}', 1) productCount5,
regexp_extract(col_value, '^(?:([^,]*),?){18}', 1) productId6,
regexp_extract(col_value, '^(?:([^,]*),?){19}', 1) productName6,
regexp_extract(col_value, '^(?:([^,]*),?){20}', 1) productCount6
from temp_recommendation; I am getting this exception: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.mapreduce.v2.util.MRApps.addLog4jSystemProperties(Lorg/apache/hadoop/mapred/Task;Ljava/util/List;Lorg/apache/hadoop/conf/Configuration;)V
There are no logs generated and this is a pseudo distributed machine. Is this method wrong for handling multi delimiters or is there any other other better way? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hive
04-14-2017
10:26 AM
The java program which I am running:- public Connection getConnection(){
Connection conn=null;
try {
conn=DriverManager.getConnection("jdbc:phoenix:ec2-52-205-23-80.compute-1.amazonaws.com:2181:/hbase");
} catch (SQLException ex) {
Logger.getLogger(DAO.class.getName()).log(Level.SEVERE, null, ex);
ex.printStackTrace();
}
return conn;
} hbase-site.xml looks like - <configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://NameNode:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://NameNode:60000</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper-3.4.9/data</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>PRIVATE_IP</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
</configuration> I have already added Phoenix client jar in Spring WAR classpath.
... View more
04-13-2017
11:20 PM
@Josh Elser, @Artem Ervits
Can you please have a look, need some assistance here. Thanks a lot.
... View more
04-13-2017
11:18 PM
1 Kudo
I am able to get the connection with below steps when compiling the code on NameNode server however, its throwing an error on elastic beanstalk instance. I need my UI layer on a different instance to connect my HBase Tables. $ javac test.java $ java -cp "../phoenix-[version]-client.jar:." test I have phoenix jar 4.10.0-HBase-1.2-Client.jar. java.sql.SQLException: No suitable driver found for jdbc:phoenix:AWS-PUBLIC-DNS-IP:2181:/hbase
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at com.nyc.nyctaxi.dao.DAO.getConnection(DAO.java:40)
at com.nyc.nyctaxi.controller.HomeController.home(HomeController.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:676)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:509)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1104)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1524)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1480)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
04-13-2017
12:24 AM
I am doing Sort merge join using tez examples jar using Tez 0.7.1. The sample of two files are:- ISBN;"Book-Title";"Book-Author";"Year-Of-Publication";"Publisher";"Image-URL-S";"Image-URL-M";"Image-URL-L"
0195153448;"Classical Mythology";"Mark P. O. Morford";"2002";"Oxford University Press";"http://images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg"
User-ID;"ISBN";"Book-Rating"
276725;"034545104X";"0" First one has 300 thousand and second one has around 1 million records and the common attribute is ISBN of a book. The DAG is getting completed successfully but there is no output. Even the logs look fine. My understanding of SortMergeJoin is that it sorts both datasets on the join attribute and then looks for qualifying records by merging the two datasets. The sorting step groups all tuples with the same value in the join column together and thus makes it easy to identify partitions or groups of tuples with the same value in the join column. I am referring this link from Tez examples. Just wanted to confirm that how is it deciding the join attribute which in this case should be ISBN. PLease help.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Tez
04-12-2017
12:51 PM
Hi experts, I have installed a 5 node Hadoop cluster on AWS EC2. I have the data in Phoenix and now I want to connect it from my local eclipse. Here is the connection string that I am passing: "jdbc:phoenix:XX.XX.XX.XX:2181:/hbase"
Error logs are: WARN : org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
ERROR: org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - ZooKeeper getChildren failed after 4 attempts
WARN : org.apache.hadoop.hbase.zookeeper.MetaTableLocator - Got ZK exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
WARN : org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) INFO: Illegal access: this web application instance has been stopped already. Could not load [com.google.common.cache.RemovalCause]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [com.google.common.cache.RemovalCause]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1353) Using private IP as of now. Should I connect using public IP? Please let me know if any other information is required.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Hive
-
Apache Phoenix