About cstanca

cstanca · ‎09-06-2016

@WONHEE CHOI 1) Make me a favor and create that directory /usr/lib/sqoop/lib, and place your odbc6.jar there and try. 2) Also, what is your SQOOP_HOME set as an environment variable? 3) Also, sqoop has --driver option where you can set the driver explicitly. Under normal conditions, if you have the SQOOP_HOME environment variable set for your sqoop user and the library is placed to the /lib folder, you shouldn't need this step 3. Let me know.

cstanca · ‎09-06-2016

@sanjeevan mahajan ... and to add to what Predrag stated based on the documentation, the same is true for all other databases, including Oracle, PostgreSQL, etc. The query needs to be rewritten to achieve the expected result, first find the min(s.from_date) per s.emp_no and at the second step join with e.emp_no to retrieve the needed other fields, as lookups. Try this: SELECT e.emp_no, e.birth_date,e.first_name, e.last_name, e.gender, s.min_from_date FROM employees e, (SELECT emp_no, min(from_date) as min_from_date FROM new2_salaries GROUP BY s.emp_no) s WHERE s.emp_no = e.emp_no; If any of the responses to your question addressed the problem don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.

cstanca · ‎09-06-2016

@Diego Campo Stop your VM and attempt the following Network Settings. 1) Use Attached to NAT screen-shot-2016-09-05-at-103517-pm.png Enable Network Adapter and Cable Connected should be checked on for your Adapter. Start your VM and test. If it does not work go to 2) 2) Use Attached to Bridged Adapter: Select the name of the internet adaptor you are currently using on your host machine. Under Advanced, make sure the machine is using the Desktop Adaptor Type Under Advanced, make sure Promiscuous Mode is set to Allow VMs Under Advanced, make sure Cable connected is checked on Hit OK to save your changes Start your VM and test. If any of the responses to your question addressed the problem don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.

cstanca · ‎09-06-2016

@WONHEE CHOI Place the odbc6.jar in /usr/lib/sqoop/lib and retry. If it does not pick-up the jar file, restart Sqoop server and try again. If any of the responses to your question addressed the problem, don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.

cstanca · ‎09-03-2016

@Thomas Larsson Resource Manager UI. You can get to it in two ways: http:/hostname:8088, where hostname is the host name of the server where Resource Manager service runs. Otherwise, from Ambari UI click on YARN (left bar) then click on Quick Links at top middle, then select Resource Manager. You will see the memory and CPU used for each container. One container is allocated to a task. Good tutorial here: http://hadooptutorial.info/yarn-web-ui/ This is the visual. You could also build your own Grafana dashboard making calls to the REST API: https://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html Please don't forget to vote/accept best answer for your question.

cstanca · ‎09-02-2016

@Madhu B There are several ways to skin this cat but they would require some classpath tricks. Before going there, could you create in Hive a view of that table, e.g. create view hbase_user_act_view as select * from hbase_user_act; and test with that? Use HiveContext, please. Let me know. If any of the responses in this thread addressed your issue, don't forget to vote and accept best answer.

cstanca · ‎09-02-2016

@sankar rao An archive has been corrupted. You probably store compressed files (e.g. gzip or lzo) in Hive table directory and at least one of those files is corrupted. I would start moving files out of that folder (HDFS) in reverse chronological order and repeat the query until successful. That way you can find the corrupted archive. There are other ways to test your archives. You could do that too. Try and let me know. If this response or any response in this thread, please don't forget to vote and accept best answer.

cstanca · ‎09-02-2016

@chandramouli muthukumaran Just to clarify, SparlSQL does not access or use Hive engine. It just consumes the metadata of Hive data structures. Assuming that both can execute the query functionally (SparkSQL is quite limited functionally compared with Hive), but the query will need to churn through 40 TB of data, then I would say likely Hive on Tez is your optimal choice. That is also driven by the cost associated with your Spark cluster RAM additional to Hive's requirements because I assume that you will still have some cases where running Hive is needed. I noticed that if the amount of data is less than 1 TB, SparkSQL outperforms Hive on Tez. Anyhow, be aware, that with HDP 2.5 LLAP is in Tech Preview and soon will be GA. If you were asking Hive on LLAP vs. SparkSQL, I would say without hesitation for most of the queries, Hive on LLAP. Again, for some sofisticated queries with limited amount of data, and limited function, SparkSQL may be a winner, but in the big picture is too expensive to maintain both approaches and I would still consider Hive on Tez and LLAP over SparkSQL for most of the cases that deal with BIG DATA. Otherwise, 1 TB does not need Hadoop for fast queries. Read more about Hive on LLAP here: http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/ Give LLAP a shot before deciding to use SparkSQL, especially, if you already have the queries written in HiveQL. If this response or any response in this thread was helpful, please don't forget to vote/accept it as the best answer.

cstanca · ‎09-01-2016

@deepak sharma Crazy enough. I just reached to this customer and s simple restart of Kafka service addressed the issue. Kerberos was enabled recently and probably this service was not restarted. Not much to learn. The symlink suggestion from you is an interesting approach which while not applicable here, is worth it to remember for other situations. Thank you for the suggestion.

cstanca · ‎09-01-2016

It seems that the data does not go to thrash. A simple restart of Kafka service addressed the issue. Kerberos was enabled recently and probably this service was not restarted. The symlink suggestion from deepak is an interesting approach which while not applicable here, is worth it to remember for other situations.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: Sqoop error with Oracle: Could not load db dri...

Re: Hive Query not working

Re: HDP2.4 sandbox - no internet access (VirtualBo...

Re: Sqoop error with Oracle: Could not load db dri...

Re: How to monitor yarn applications actual memory...

Re: Spark sqlContext - unable to access hbase tabl...

Re: urgent need: When try to run simple hive query...

Re: HIve on Tez or HIve query using Spark SQL

Re: Issue with Kafka after Kerberos was enabled: d...

Re: Issue with Kafka after Kerberos was enabled: d...