About nmaillard1

nmaillard1 · ‎07-12-2016

Hello pooja From your stack trace your table seems to be bucketed. Can you share your table definition could you also try running the query with the setting: hive.auto.convert.join.noconditionaltask=false

nmaillard1 · ‎06-21-2016

Hello Michel Right now the Hive plan calculation does not reach out to get Hbases stats so currently no added benefit from the Hbase stats. This being said these are questions that are being worked on in different initiatives, so this will likely change in the future.

nmaillard1 · ‎06-17-2016

hello Timothy There are mutilple ways to integrate these 3 services. As a starting point Nifi will probably be your ingestion flow. During this flow you could - put your data to kafka and have spark read from it - push your nifi data to spark: https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark - you could use and execute script processor and start a pig job In summary you can have a push and forget connection, you can have a push to service and pick in next flow approach, or even execute in processor as corner case maybe hope this shares some insight

nmaillard1 · ‎06-12-2016

The warning DFSInputStream has been closed already is a warning and is fixed in hadoop 2.7.2 : https://issues.apache.org/jira/browse/HDFS-8099 taking 10 minutes to submit the job seems to be another problem from your reduce issue. Do check what your available ressources look like in yarn and how long it takes to get an Application master. i t would interesting in the logs to see if he is waiting or what else is happening. You could also check it your timeline server is responding and not underwater as it could have impacts

nmaillard1 · ‎06-12-2016

Hello Venkadesh It would be worth investigating why your reducer gets a time out error and then gets completed. Do you have a slow node, is a code related error, are your reducers sitting around too long. According to those questions many options are available - you could increase the task timeout, (mapred.task.timeout) - you could force increase the number of reducers to get better distribution (mapred.reduce.tasks=# of reducers) -you could configure when reducers start get to start clsoer to the end of the map phase (mapred.reduce.slowstart.completed.maps) - you could have speculative execution to see if some nodes are faster then others: These are some ideas that come to mind, depending on a closer analysis their might also be other ways. hope any of this helps

nmaillard1 · ‎05-03-2016

Hello Ethan No kafka is not necessary when importing data into atlas. Atals will actually listen in to services like, Hive,Sqoop, Falcon etc... to automatically import data. You can also interact with the Atlas APIs, rest or no, to import your own data, say tags for example. Kafka is very useful for example in the communication with ranger for security policies. As you add tags to data in Atlas you want Ranger to pick them up as soon as possible and kafka is that gateway. Kafka in the not so distant future will also be a service monitored by Atlas as it also is a gateway for data inside hadoop and as such is a source Atlas should do governance for. hope this helps

nmaillard1 · ‎05-03-2016

Hello Ethan the difference is mainly batch and realtime. By this I mean the bridge will actually import all existing data and actually metadata from the Hive metastore, so all prexisiting tables and definitions, where the hook will actually listen in real time to events happening in Hive. The Atlas documentation explains this here if you want a more detailled explanation http://atlas.incubator.apache.org/Bridge-Hive.html

nmaillard1 · ‎05-02-2016

can you make sure tt the top of the tutorial page, open the gear icon and make sure "hive %hive..." is blue and click save. If not can you share the description and config of your hive interpreter

nmaillard1 · ‎05-01-2016

In order to use node labels you will first have to enable them in yarn: yarn.node-labels.enabled true then set up a label directory, create labels and associate to hosts and queue. Labels are logically accessed through the capacity queue they are associated with. In your case it would just be running your job in the right yarn capacity queue. The documentation has an example that can help you: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/configuring_node_labels.html

nmaillard1 · ‎04-30-2016

hello Sumit If your ulimit is already set to unlimited or a very high number, you could actually getting insight on the number of open files with lsof | wc -l. You may need to increase the max number of filed handles in the os. check fs.file-max to see if this helps. this is to try to solve the cause. An offlineMetaRepair, fix meta should help with the consequence.

Online	Offline
Last Visited	‎10-17-2018 10:48 AM

Member Since	‎09-17-2015 07:33 PM
Last Visited	‎10-17-2018 10:48 AM
Posts	70
Kudos received	79

Cloudera Community

Re: To what extent is schema evolution available i...

Re: CONTROL SIZE OUTPUT FILE SIZE WITHOUT ADDING M...

Re: how to compute regionserver's normal region co...

Re: What I need to check if Job taking 15-20 min m...

Re: Integration between Apache Pig, Apache Nifi an...

Re: Out of Memory Error in Hive

Re: CBO for Hive over hbase

Re: Integration between Apache Pig, Apache Nifi an...

Re: Mapreduce Job -Taking too long to complete

Re: Mapreduce Job -Taking too long to complete

Re: Is Kafka necessary to import metadada into Atl...

Re: What's the difference between Bridge and Hook ...

Re: zeppelin tutorial, "hive interpreter not found...

Re: Node labels examples

Re: Too many open files in region server logs