Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4132 | 08-20-2018 08:26 PM | |
| 2006 | 08-15-2018 01:59 PM | |
| 2431 | 08-13-2018 02:20 PM | |
| 4235 | 07-23-2018 04:37 PM | |
| 5122 | 07-19-2018 12:52 PM |
11-29-2016
05:52 AM
@satya s that is totally fine. that is just default. with hive you are not locked into any format. default just means that, it is default. create a hive table using this format: CREATE EXTERNAL TABLE IF NOT EXISTS Cars(
Name STRING,
Miles_per_Gallon INT,
Cylinders INT,
Displacement INT,
Horsepower INT,
Weight_in_lbs INT,
Acceleration DECIMAL,
Year DATE,
Origin CHAR(1))
COMMENT 'Data about cars from a public database'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
location '/user/<username>/visdata';
and then create another hive table with similar schema but this time have it as orc table CREATE TABLE IF NOT EXISTS mycars(
Name STRING,
Miles_per_Gallon INT,
Cylinders INT,
Displacement INT,
Horsepower INT,
Weight_in_lbs INT,
Acceleration DECIMAL,
Year DATE,
Origin CHAR(1))
COMMENT 'Data about cars from a public database'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC;
then simply load from your text base hive table into the orc table INSERT OVERWRITE TABLE mycars SELECT * FROM cars;
and now your data has been converted to orc.
... View more
11-28-2016
06:26 PM
Note that the above is deletes older files based on file modification time, not based on the timestamp in the filename. I did use the filename with a timestamp, which probably makes the example confusing. So that command could be used with any kind of file such as keeping the last 5 copies of your backup files. Also, if you use logrotate (e.g. where log4j rolling files is not an option), you can use the maxage option, which also uses modified time. This is from the logrotate man page: maxage count
Remove rotated logs older than <count> days. The age is only checked if the logfile is to be rotated. The files are mailed to the configured address if maillast and mail are configured.
... View more
11-30-2016
04:08 PM
1 Kudo
@Sunile Manjee This would really depend on the cluster size and the number of jobs running. Very hard to gauge. A 10 node cluster with around 15-20 components can easily generate 1GB of audit logs PER DAY. Again depends on the cluster activity. You could use this as a baseline, but again, really hard to gauge. Then again consider this only if being forced to use DB and after strongly advising against using DB as oppose to using Solr for ranger audits 🙂
... View more
12-31-2016
03:25 AM
@Binu Mathew : Thanks for sharing the awesome article. Do you mind to share the sample data?
... View more
11-17-2016
05:51 PM
1 Kudo
This is a known problem when Phoenix is enabled, see similar posts here: https://community.hortonworks.com/questions/57874/error-unable-to-find-orgapachehadoophbaseipccontro.html That class is actually from Phoenix: https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/hadoop/hbase/ipc/controller/ServerRpcControllerFactory.java It will be fixed in Apache NiFi 1.1 by allowing users to specify the path to the phoenix client JAR. For now you can copy phoenix-client.jar to nifi_home/work/nar/extensions/nifi-hbase_1_1_2-client-service-nar-1.1.0-SNAPSHOT.nar-unpacked/META-INF/bundled-dependencies/ obviously adjusting the directories for your version.
... View more
11-21-2016
02:18 PM
1 Kudo
Make sure that you have updated hbase-site.xml in your sqlline class path to have properties to take effect.
... View more
11-15-2016
03:56 PM
@Sunile Manjee Answer is Yes. In HDP 2.5 Spark Column Security is available with LLAP and Ranger integration You get Fine-Grained
Column Level Access Control for SparkSQL. Fully
dynamic policies per user. Doesn’t require views. Use
Standard Ranger policies and tools to control access and masking policies. Flow: 1.SparkSQL gets data locations known as
“splits” from HiveServer and plans query. 2.HiveServer2 authorizes access using
Ranger. Per-user policies like row filtering are applied. 3.Spark gets a modified query plan
based on dynamic security policy.
4.Spark reads data from LLAP.
Filtering / masking guaranteed by LLAP server.
... View more
02-06-2017
02:50 AM
@Ancil McBarnett its been a while since I have tried it. here is the information: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_zeppelin-component-guide/content/zeppelin-with-hive.html
... View more
11-11-2016
10:10 PM
Ranger doesn't have the details on the Job Id / App Id, hence not in audit logs
... View more
01-30-2019
04:20 AM
1 Kudo
I know this is very old, but I also faced the same thing, but sorted out as below and putting to help anyone else facing the same: You are using different port to connect. There are different ports to be connected - 2122 (HostSSH) and 2202 (Sandbox SSH2). Web Shell which localhost:4200, connected to 2202 i.e. Sandbox SSH2. So, if you will use 2202 through WinSCP or Filezilla any client, you will see the same files. ,
... View more