About saranvisa

saranvisa · ‎02-21-2017

@Milanovo There are two questions, 1. how to make impala to understand already stored data? 2. how to get data stored? Ans for first question: Imapala is an SQL like query engine. so you can create 'external' table on top of existing data as follows. CREATE EXTERNAL TABLE IF NOT EXISTS tblname( col1 datatype, col2 datatype) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/hive/warehouse/.....'; Ans for the 2nd question: Once you mentioned the location while table creation, it will take care the file internally. May be you can use partition option but no need to alter/add/modify your location

saranvisa · ‎02-17-2017

@bushnoh it looks normal to me.. because Impala daemon will be available in all the nodes (in general), but server will be in one node (may be additional nodes if you have HA)... so no need to connect to every individual nodes in the Distributed system

saranvisa · ‎02-15-2017

@donigrubbs which link that you have followed for the upgrade? https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_new_changed_features.html#concept_o5s_mfy_sm As per the above link, there is no intermediate version between 5.5.4 and 5.6.0 (But you are referring to 5.5.5 ?) What's New in Cloudera Manager 5.6.0 What's New in Cloudera Manager 5.5.4 What's New in Cloudera Manager 5.5.3 What's New in Cloudera Manager 5.5.2

saranvisa · ‎02-15-2017

@gsalerno It seems kerberos enabled in your cluster and kerberos ticket is missing. After you login, you have to enter $kinit uid@REALM.COM and enter the kerberos password then try to leave safemode as sudo Thanks Kumar

saranvisa · ‎02-14-2017

@gimp077 To my knowledge, there are two wasy to interact spark with hive. This is the very high level information to interact hive with spark # Login to hive and try the below steps Run the query in hive itself with spark engine # To check the current execution engine hive> set hive.execution.engine; hive.execution.engine=mr # by default it is mr # To setup the current execution engine to spark. Note: This is session specific hive> set hive.execution.engine=spark; # To check the execution engine after setup hive> set hive.execution.engine; hive.execution.engine=spark run your quries now # Login to spark and try the below steps >Spark-shell scala> sqlContext.sql("select * from tbl1").collect().foreach(println) ## An example

saranvisa · ‎02-03-2017

@vsreddy You may need to follow the ACL conept, pls refer the below link, it has very high level information about security https://community.cloudera.com/t5/Security-Apache-Sentry/Hadoop-Security-for-beginners/m-p/48576#M174 Thanks Kumar

saranvisa · ‎02-02-2017

@srirocky If your total memory is 24 GB then you should not set your max memory alloacation to 22 GB. Because when you run any job, it may use more than one containers and the below properties that you are setting is per container. 1. As I mentioned above, pls refer the link that i've provided and search for this parameters and you will be noticed that it belongs to containers... yarn.nodemanager.resource.memory-mb yarn.scheduler.maximum-allocation-mb 2. Now go to http://ipaddress:8088, Run a job and check how many "Containers Running". If you are running a small job, it will try to use one container but for bigger jobs, it will be increased. Since you setup 22GB for max memory allocation and when it try to use more than one container, it may end up with unnecessary error (case by case), becuase your total memory itself is 24 GB 3. So the bottom line is you cannot increase your min/max memory/core allocation with some random numbers, you need to follow the link that i've provided to calculate By default, you can set the minimum to 1 GB and max to 4 GB (Subject to change) yarn.scheduler.minimum-allocation-mb yarn.scheduler.maximum-allocation-mb Since the memory is specific to container, for bigger jobs, it will try to use more container and the corresponding max memroy Hope this will help you!! Thanks Kumar

saranvisa · ‎02-02-2017

@srirocky The image that you have pasted is not visible (under "this is what yarn reflecting") In the mean time, pls answer the below 1. what is your cluster capacity? 2. Are you following the formulas from the below link to setup Yarn (or) Increasing the size with some random numbers? https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html Thanks Kumar

saranvisa · ‎02-02-2017

@srirocky I think you are updating yarn-site.xml via CLI. If you are using Cloudera Manager then I would recommend you to update yarn-zite.xml via Cloudera Manager -> Yarn -> Configuration instead of CLI. Becuase hadoop will maintain yarn-site.xml in many places for many reasons, so updating yarn-site.xml in one (wrong) place will not be reflected. After you made the above change, CM -> Yarn -> will show Stale configuration, save it and restart Yarn in CM itselft (instead of CLI) Thanks Kumar

saranvisa · ‎01-31-2017

@AnisurRehman 1. Pls refer this official link to know more about sqoop. Change the version according to your sqoop version: https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html 2. Yes bulk import is possible. Pls refer "sqoop-import-all-tables" topic from the above link 3. About Incremental: Pls refer "incremental import" from the above link 4. About Impala for Sqoop: a. Sqoop uses Mapper from MapReduce (No Reducers by default). It will refer the hive db/table just to idenfy the target location and it will never use hive/impala engine/process methods to import. So specifying impala/hive doesn't make any difference, so sqoop provides hive-import option by default. The bottom line is you can continue to use hive options in the sqoop script b. After data import, it is upto your option to use either hive/impala depends upon your requirement. But as you mentioned, you can use impala in certain situation, so pls use impala only when it is necessary (some priority tables) Thanks Kumar

Online	Offline
Last Visited	‎08-10-2019 05:12 PM

Member Since	‎09-02-2016 11:35 AM
Last Visited	‎08-10-2019 05:12 PM
Posts	523
Kudos received	96

Cloudera Community

Re: Promoting Metadata

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: How do I see user usage stats by table in Impa...

Re: Replica Not FoundException

Re: CSV files stored in partition to HDFS

Re: Impala: Querying daemon directly when using Ke...

Re: CDH upgrade alternatives link not updated

Re: Leave Safemode

Re: converting hive sql to spark sql

Re: Hue 3.11 user access control on S3 storage

Re: Yarn-site.xml changes not reflecting

Re: Yarn-site.xml changes not reflecting

Re: Yarn-site.xml changes not reflecting

Re: How to bulk upload Impala -Direct table[Oracle...