About sunile_manjee

sunile_manjee · ‎11-29-2016

@satya s that is totally fine. that is just default. with hive you are not locked into any format. default just means that, it is default. create a hive table using this format: CREATE EXTERNAL TABLE IF NOT EXISTS Cars( Name STRING, Miles_per_Gallon INT, Cylinders INT, Displacement INT, Horsepower INT, Weight_in_lbs INT, Acceleration DECIMAL, Year DATE, Origin CHAR(1)) COMMENT 'Data about cars from a public database' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/<username>/visdata'; and then create another hive table with similar schema but this time have it as orc table CREATE TABLE IF NOT EXISTS mycars( Name STRING, Miles_per_Gallon INT, Cylinders INT, Displacement INT, Horsepower INT, Weight_in_lbs INT, Acceleration DECIMAL, Year DATE, Origin CHAR(1)) COMMENT 'Data about cars from a public database' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC; then simply load from your text base hive table into the orc table INSERT OVERWRITE TABLE mycars SELECT * FROM cars; and now your data has been converted to orc.

james_jones · ‎11-28-2016

Note that the above is deletes older files based on file modification time, not based on the timestamp in the filename. I did use the filename with a timestamp, which probably makes the example confusing. So that command could be used with any kind of file such as keeping the last 5 copies of your backup files. Also, if you use logrotate (e.g. where log4j rolling files is not an option), you can use the maxage option, which also uses modified time. This is from the logrotate man page: maxage count Remove rotated logs older than <count> days. The age is only checked if the logfile is to be rotated. The files are mailed to the configured address if maillast and mail are configured.

ed_gleeck · ‎11-30-2016

@Sunile Manjee This would really depend on the cluster size and the number of jobs running. Very hard to gauge. A 10 node cluster with around 15-20 components can easily generate 1GB of audit logs PER DAY. Again depends on the cluster activity. You could use this as a baseline, but again, really hard to gauge. Then again consider this only if being forced to use DB and after strongly advising against using DB as oppose to using Solr for ranger audits 🙂

hadoopdataanaly · ‎12-31-2016

@Binu Mathew : Thanks for sharing the awesome article. Do you mind to share the sample data?

bbende · ‎11-17-2016

This is a known problem when Phoenix is enabled, see similar posts here: https://community.hortonworks.com/questions/57874/error-unable-to-find-orgapachehadoophbaseipccontro.html That class is actually from Phoenix: https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/hadoop/hbase/ipc/controller/ServerRpcControllerFactory.java It will be fixed in Apache NiFi 1.1 by allowing users to specify the path to the phoenix client JAR. For now you can copy phoenix-client.jar to nifi_home/work/nar/extensions/nifi-hbase_1_1_2-client-service-nar-1.1.0-SNAPSHOT.nar-unpacked/META-INF/bundled-dependencies/ obviously adjusting the directories for your version.

asinghal · ‎11-21-2016

Make sure that you have updated hbase-site.xml in your sqlline class path to have properties to take effect.

azeltov · ‎11-15-2016

@Sunile Manjee Answer is Yes. In HDP 2.5 Spark Column Security is available with LLAP and Ranger integration You get Fine-Grained Column Level Access Control for SparkSQL. Fully dynamic policies per user. Doesn’t require views. Use Standard Ranger policies and tools to control access and masking policies. Flow: 1.SparkSQL gets data locations known as “splits” from HiveServer and plans query. 2.HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied. 3.Spark gets a modified query plan based on dynamic security policy. 4.Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.

sunile_manjee · ‎02-06-2017

@Ancil McBarnett its been a while since I have tried it. here is the information: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_zeppelin-component-guide/content/zeppelin-with-hive.html

rmani · ‎11-11-2016

Ranger doesn't have the details on the Job Id / App Id, hence not in audit logs

gulatimahesh · ‎01-30-2019

I know this is very old, but I also faced the same thing, but sorted out as below and putting to help anyone else facing the same: You are using different port to connect. There are different ports to be connected - 2122 (HostSSH) and 2202 (Sandbox SSH2). Web Shell which localhost:4200, connected to 2202 i.e. Sandbox SSH2. So, if you will use 2202 through WinSCP or Filezilla any client, you will see the same files. ,

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Destination table is stored as ORC but the fil...

Re: easiest way to remove hadoop logs

Re: Ranger Audit DB size?

Re: NiFi for Clickstream Log Ingestion into HBase ...

Re: NiFi HBase Service Controller failing in NiFi ...

Re: Phoenix index becom unavaiable

Re: Ranger Dynamic query rewrite available for hiv...

Re: Zeppelin user impersonation for Hive?

Re: Ranger audit hive application ID or job ID?

Re: Difference between root in sandbox and HDP?