Member since
09-24-2015
178
Posts
113
Kudos Received
28
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3376 | 05-25-2016 02:39 AM | |
3590 | 05-03-2016 01:27 PM | |
839 | 04-26-2016 07:59 PM | |
14394 | 03-24-2016 04:10 PM | |
2019 | 02-02-2016 11:50 PM |
12-30-2015
02:57 AM
@Raja Sekhar Chintalapati - Sounds like the OS level TCP IP params are not tuned correctly. What kind of OS are you using? For e.g. on Centos and RHEL, you can use these command to check the values - sysctl net.ipv4.tcp_keepalive_time
sysctl net.ipv4.tcp_keepalive_probes
sysctl net.ipv4.tcp_keepalive_intvl There are other params that can be tuned as well but these 3 are kind of critical. For e.g. if the tcp_keepalive_time is set to a high value then the OS holds onto that port even after the transaction is done (SQL is processed and results are obtained).. This is not Ambari specific, as you can see. If this was Ambari issue, there should be a much wider impact. I recommend checking the TCP WAITTIME and other params on both, the source and destination servers. Here are the recommended values - net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15
... View more
12-30-2015
02:47 AM
2 Kudos
At time, due to incomplete shutdown, previous process still may be using the port. So just check the process that is using the port and (if it is the same process you are trying to start, which it is in 99.9% of the cases) then just kill the previous process before starting. netstat -lnap | grep <port>
#The output has the process id in it
kill -9 <pid>
# After the process is killed, try starting the service.
... View more
12-29-2015
09:18 PM
1 Kudo
siva - Hive 0.12 was HDP 2.0 so take a look at - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-... Word of Caution: It is not recommended to upgrade only one service from the stack while keeping others at the older version. Hive definitely has dependency on other components like Zookeeper, Core Hadoop etc.. so you should consider upgrading the whole stack.
... View more
12-24-2015
05:08 PM
1 Kudo
@Brian Ramsel Have to explored and tried changing the memory setting for reducer and the number of reducers? I am totally in agreement with moving to 10G but just wondering if there is an opportunity to improve the performance with current setup. In the recent past, working with a prospect on a POC, we were able to ingest 600GB file in about 30 mins on a small 4 node cluster. (64GB RAM, 10GiBE, other tuning done at the app/service level). Not sure how big this cluster is and what is the hardware spec though.
... View more
12-14-2015
06:04 PM
+1 on HDF/NiFi. It can make the whole process really easy for you through its graphical canvas screen to design the flow. For custom solutions, I can think of two high level patterns - 1) Index the docs when they are pushed to HDFS - OR - 2) Run a job every so often that looks for new content and then index it. The core logic of indexing the doc will be the same and will make use of ExtractingRequestHandler
... View more
12-14-2015
03:59 AM
Also, moved the question to Data Processing category as "Community Help" is for help regarding navigation and other such aspects of this community forum. It is confusing and we are looking to change that title soon.
... View more
12-14-2015
03:57 AM
Also, are you using HDP Sandbox? Which version of HDP are you on? Have you made sure via Ambari that Hive is up and running?
... View more
12-12-2015
02:42 AM
1 Kudo
@vagrawal - Are you running HiveCLI on the same server that has Hue Server running? That could be the issue. Depending on how the UDF jar is added it may not be accessible to all the Hive clients. You could place the jar in HDFS and then register the UDF. This may be helpful too - https://community.hortonworks.com/questions/2390/methods-to-add-jars-to-hive.html
... View more
12-12-2015
02:35 AM
1 Kudo
@Pavel Benes This is an interesting idea. Typically there are various ways to ensure that large quantity of data is ingested successfully, like running frequent smaller updates, running more mappers during sqoop job, increasing infrastructure (obviously last resort). I am not able to see how we reliably pull the data by setting a timer based stop on the process. This is definitely worth the discussion though. Can you may be open a new thread (as idea and not as question) with some details around design? We should be able to brainstorm this.
... View more