About saranvisa

saranvisa · ‎04-16-2018

@null_pointer For some reason I cannot see the image that you have uploaded, still i got your point and trying to answer your question we cannot always match/compare the memory usage from CM vs linux for various reasons 1. Yes, as you said CM only takes count of memory used by Hadoop components and it won't count consider if you have any other applications running on your local linux as CM designed to monitor only Hadoop and dependent services 2. (I am not sure you are getting the CM report from host monitor) There are practical difficulties to get memory usage of each client node in a single report. Ex: Consider you have 100+ nodes and each node has different memory capacity like 100 GB, 200 GB, 250 GB, 300 GB, etc, it is difficult to generate a single report to get memory usage of each client still if the default report available in CM is not meeting your requirement, may be you can try to build custom chart from CM -> Chart (menu) -> your tsquery https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cluster_util_custom.html

saranvisa · ‎04-15-2018

@Aedulla here you go,.... http://www.bayareabikeshare.com/open-data https://grouplens.org/datasets/movielens/ https://www.nyse.com/market-data/historical also you can use the below free hue access (login uid: demo, pwd: demo) where you can get some pre-existing data for hive, impala, hbase, etc. Note: if you are getting any exception after login, pls try after sometime or raise a ticket, so that someone from hue team will fix the issue http://demo.gethue.com

saranvisa · ‎04-11-2018

@bukangarii as long as you have jdbc connectivity to your legacy system, it is possible to export the parquet hive table to your legacy system please check the sqoop guide document to understand the supporting data types

saranvisa · ‎04-10-2018

@ludof no need to do it everytime, because in general once you done kinit, it will be valid for 24 hours (you can customize if you want), so do it once a day manually or you can automate it in some scenarios using cron jobs ex: you have jobs round the clock, more than one users are using the same user/batchid for a project, etc

saranvisa · ‎04-09-2018

@hedy can you try to run the 2nd pyspark command from a different user id? because it seems this is normal issue according to the below link https://support.datastax.com/hc/en-us/articles/207356773-FAQ-Warning-message-java-net-BindException-Address-already-in-use-when-launching-Spark-shell

saranvisa · ‎04-09-2018

@ludof all you have to do is, run the kinit command and give the kerberos password before you start your spark session and continue with your steps, it will be fixed

saranvisa · ‎04-09-2018

@hedy did you get a chance to get answer for my first question

saranvisa · ‎04-09-2018

@hedy In general one port will allow one session (one connection) at a time, so your 1st session connects to the default port 4040 and your 2nd session is trying to connect to the same port but got the bind issue, so trying to connect to the next port but it is not working there are two things that you need to check 1. please make sure the port 4041 is open 2. On your second session, when you run pyspark, pass the avilable port as a parameter. Ex: Long back i've used spark-shell with different port as parameter, pls try similar option for pyspark session1: $ spark-shell --conf spark.ui.port=4040 session2: $ spark-shell --conf spark.ui.port=4041 if 4041 is not working you can try upto 4057, i think thease are the available port for spark by default

saranvisa · ‎04-09-2018

@RajeshBodolla Not sure I get your intension to have multiple datanodes on the same machine if you want to store data nodes in different/multiple directories in the same machine then you can use CM -> HDFS -> Configuration -> datanode.data.dir and specify your directories

saranvisa · ‎02-12-2018

srinivas ?? 🙂 @Cloudera learning Is it struck when 1 or 2 blocks left over? As mentioned earlier, you can monitor this from CM -> HDFS -> WebUI -> Namenode Web UI -> a new window will open, 'Datanodes' menu -> scroll down to Decommissioning (keep refresh this page to get the progress) If your answer is yes for my above question, then I got the similar issues few times and I've over come this issue as follows: 1. CM -> Hosts -> Abort the decomm process 2. CM -> HDFS -> Instance -> Node -> Stop 3. Try to decommission the same node again for the left over blocks Note: Some times you may struck again, retry couple of times

Online	Offline
Last Visited	‎08-10-2019 05:12 PM

Member Since	‎09-02-2016 11:35 AM
Last Visited	‎08-10-2019 05:12 PM
Posts	523
Kudos received	96

Cloudera Community

Re: Promoting Metadata

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: How do I see user usage stats by table in Impa...

Re: Replica Not FoundException

Re: Cloudera Manager difference in physical memory...

Re: Hive Practice data and questions

Re: sqoop export from hive partitioned parquet tab...

Re: Spark SQL action fails in Kerberos secured clu...

Re: Unable to run multiple pyspark sessions

Re: Spark SQL action fails in Kerberos secured clu...

Re: Unable to run multiple pyspark sessions

Re: Unable to run multiple pyspark sessions

Re: Multiple datanodes on the same machine?

Re: time for decommission a data node