Member since
09-17-2015
70
Posts
79
Kudos Received
20
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2399 | 02-27-2018 08:03 AM | |
2185 | 02-27-2018 08:00 AM | |
2639 | 10-09-2016 07:59 PM | |
912 | 10-03-2016 07:27 AM | |
989 | 06-17-2016 03:30 PM |
07-12-2016
04:44 PM
Hello pooja From your stack trace your table seems to be bucketed. Can you share your table definition could you also try running the query with the setting: hive.auto.convert.join.noconditionaltask=false
... View more
06-21-2016
10:03 AM
Hello Michel Right now the Hive plan calculation does not reach out to get Hbases stats so currently no added benefit from the Hbase stats. This being said these are questions that are being worked on in different initiatives, so this will likely change in the future.
... View more
06-17-2016
03:30 PM
4 Kudos
hello Timothy There are mutilple ways to integrate these 3 services. As a starting point Nifi will probably be your ingestion flow. During this flow you could - put your data to kafka and have spark read from it - push your nifi data to spark: https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark - you could use and execute script processor and start a pig job In summary you can have a push and forget connection, you can have a push to service and pick in next flow approach, or even execute in processor as corner case maybe hope this shares some insight
... View more
06-12-2016
11:56 AM
The warning DFSInputStream has been closed already is a warning and is fixed in hadoop 2.7.2 : https://issues.apache.org/jira/browse/HDFS-8099 taking 10 minutes to submit the job seems to be another problem from your reduce issue. Do check what your available ressources look like in yarn and how long it takes to get an Application master. i t would interesting in the logs to see if he is waiting or what else is happening. You could also check it your timeline server is responding and not underwater as it could have impacts
... View more
06-12-2016
10:27 AM
Hello Venkadesh It would be worth investigating why your reducer gets a time out error and then gets completed. Do you have a slow node, is a code related error, are your reducers sitting around too long. According to those questions many options are available - you could increase the task timeout, (mapred.task.timeout) - you could force increase the number of reducers to get better distribution (mapred.reduce.tasks=# of reducers) -you could configure when reducers start get to start clsoer to the end of the map phase (mapred.reduce.slowstart.completed.maps) - you could have speculative execution to see if some nodes are faster then others: These are some ideas that come to mind, depending on a closer analysis their might also be other ways. hope any of this helps
... View more
05-03-2016
07:33 AM
Hello Ethan No kafka is not necessary when importing data into atlas. Atals will actually listen in to services like, Hive,Sqoop, Falcon etc... to automatically import data. You can also interact with the Atlas APIs, rest or no, to import your own data, say tags for example. Kafka is very useful for example in the communication with ranger for security policies. As you add tags to data in Atlas you want Ranger to pick them up as soon as possible and kafka is that gateway. Kafka in the not so distant future will also be a service monitored by Atlas as it also is a gateway for data inside hadoop and as such is a source Atlas should do governance for. hope this helps
... View more
05-03-2016
07:22 AM
1 Kudo
Hello Ethan the difference is mainly batch and realtime. By this I mean the bridge will actually import all existing data and actually metadata from the Hive metastore, so all prexisiting tables and definitions, where the hook will actually listen in real time to events happening in Hive. The Atlas documentation explains this here if you want a more detailled explanation http://atlas.incubator.apache.org/Bridge-Hive.html
... View more
05-02-2016
03:47 PM
can you make sure tt the top of the tutorial page, open the gear icon and make sure "hive %hive..." is blue and click save. If not can you share the description and config of your hive interpreter
... View more
05-01-2016
08:17 AM
In order to use node labels you will first have to enable them in yarn: yarn.node-labels.enabled true then set up a label directory, create labels and associate to hosts and queue. Labels are logically accessed through the capacity queue they are associated with. In your case it would just be running your job in the right yarn capacity queue. The documentation has an example that can help you: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/configuring_node_labels.html
... View more
04-30-2016
09:24 AM
hello Sumit If your ulimit is already set to unlimited or a very high number, you could actually getting insight on the number of open files with lsof | wc -l. You may need to increase the max number of filed handles in the os. check fs.file-max to see if this helps. this is to try to solve the cause. An offlineMetaRepair, fix meta should help with the consequence.
... View more