About bsaini

bsaini · ‎12-30-2015

Note: Question moved to the Operations track.

bsaini · ‎12-30-2015

@Raja Sekhar Chintalapati - Sounds like the OS level TCP IP params are not tuned correctly. What kind of OS are you using? For e.g. on Centos and RHEL, you can use these command to check the values - sysctl net.ipv4.tcp_keepalive_time sysctl net.ipv4.tcp_keepalive_probes sysctl net.ipv4.tcp_keepalive_intvl There are other params that can be tuned as well but these 3 are kind of critical. For e.g. if the tcp_keepalive_time is set to a high value then the OS holds onto that port even after the transaction is done (SQL is processed and results are obtained).. This is not Ambari specific, as you can see. If this was Ambari issue, there should be a much wider impact. I recommend checking the TCP WAITTIME and other params on both, the source and destination servers. Here are the recommended values - net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.tcp_keepalive_intvl = 15

bsaini · ‎12-30-2015

At time, due to incomplete shutdown, previous process still may be using the port. So just check the process that is using the port and (if it is the same process you are trying to start, which it is in 99.9% of the cases) then just kill the previous process before starting. netstat -lnap | grep <port> #The output has the process id in it kill -9 <pid> # After the process is killed, try starting the service.

bsaini · ‎12-29-2015

siva - Hive 0.12 was HDP 2.0 so take a look at - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-... Word of Caution: It is not recommended to upgrade only one service from the stack while keeping others at the older version. Hive definitely has dependency on other components like Zookeeper, Core Hadoop etc.. so you should consider upgrading the whole stack.

bsaini · ‎12-24-2015

@Brian Ramsel Have to explored and tried changing the memory setting for reducer and the number of reducers? I am totally in agreement with moving to 10G but just wondering if there is an opportunity to improve the performance with current setup. In the recent past, working with a prospect on a POC, we were able to ingest 600GB file in about 30 mins on a small 4 node cluster. (64GB RAM, 10GiBE, other tuning done at the app/service level). Not sure how big this cluster is and what is the hardware spec though.

bsaini · ‎12-14-2015

+1 on HDF/NiFi. It can make the whole process really easy for you through its graphical canvas screen to design the flow. For custom solutions, I can think of two high level patterns - 1) Index the docs when they are pushed to HDFS - OR - 2) Run a job every so often that looks for new content and then index it. The core logic of indexing the doc will be the same and will make use of ExtractingRequestHandler

bsaini · ‎12-14-2015

Also, moved the question to Data Processing category as "Community Help" is for help regarding navigation and other such aspects of this community forum. It is confusing and we are looking to change that title soon.

bsaini · ‎12-14-2015

Also, are you using HDP Sandbox? Which version of HDP are you on? Have you made sure via Ambari that Hive is up and running?

bsaini · ‎12-12-2015

@vagrawal - Are you running HiveCLI on the same server that has Hue Server running? That could be the issue. Depending on how the UDF jar is added it may not be accessible to all the Hive clients. You could place the jar in HDFS and then register the UDF. This may be helpful too - https://community.hortonworks.com/questions/2390/methods-to-add-jars-to-hive.html

bsaini · ‎12-12-2015

@Pavel Benes This is an interesting idea. Typically there are various ways to ensure that large quantity of data is ingested successfully, like running frequent smaller updates, running more mappers during sqoop job, increasing infrastructure (obviously last resort). I am not able to see how we reliably pull the data by setting a timer based stop on the process. This is definitely worth the discussion though. Can you may be open a new thread (as idea and not as question) with some details around design? We should be able to brainstorm this.

Online	Offline
Last Visited	‎04-06-2018 07:42 PM

Member Since	‎09-24-2015 03:23 PM
Last Visited	‎04-06-2018 07:42 PM
Posts	178
Kudos received	103

Cloudera Community

Re: Which is better to create Hadoop accounts in L...

Re: Last step of Ambari HDP installation fails for...

Re: How to create falcon entity dependencies?

Re: Where is the output of an Oozie workflow store...

Re: Hi I am new to falcon , can anyone help me wit...

Re: Ambari with mysql 5.6

Re: Ambari with mysql 5.6

Re: How can I resolve java.net.BindException: Addr...

Re: how to upgrade hortonworks hive version 0.12 ...

Re: How to utilize infiniband backbone during MapR...

Re: Solr: How to index rich text document put in h...

Re: Not able to start Hive on HDP sandbox

Re: Not able to start Hive on HDP sandbox

Re: Hue + UDF access issue

Re: Limiting Sqoop imports to some time intervals