Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11462 | 03-21-2017 07:34 PM | |
2884 | 11-16-2016 04:18 AM | |
1608 | 10-18-2016 03:57 PM | |
4266 | 09-12-2016 03:36 PM | |
6214 | 08-25-2016 09:01 PM |
12-28-2015
07:23 PM
1 Kudo
See the following Question How to Write Cluster Startup and Shutdown Scripts with Ambari
... View more
12-16-2015
11:03 PM
No I don't think it's possible. Admin Authentication is only for one cluster. We would need to specify when we authenticate which cluster we are authenticating for. Security Policies and configuration is on a cluster basis in the Ranger Admin database.
... View more
12-16-2015
05:39 PM
Post truncated as the messages were repeated...
... View more
12-11-2015
04:25 AM
3 Kudos
For high throughput use cases, Solr (actually Solr in Cloud mode) should run on separate nodes.
However for HDFS based indexes you may get slight performance degradation.
You can colocate Solr with the Datanodes but you sacrifice latency.
So since you are running Spark jobs also, I would recommend SolrCloud on a couple more nodes
... View more
12-10-2015
03:29 PM
2 Kudos
@Mehdi TAZI
Avro
CREATE EXTERNAL TABLE table_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
TBLPROPERTIES ('avro.schema.url'='hdfs://user/schemas/table_avro.avsc')
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/table/table_avro';
From Hive 0.14 and Up (Easier DDL)
CREATE TABLE kst (
string1 string,
string2 string,
int1 int,
boolean1 boolean,
long1 bigint,
float1 float,
double1 double,
inner_record1 struct,
enum1 string,
array1 array,
map1 map,
union1 uniontype,
fixed1 binary,
null1 void,
unionnullint int,
bytes1 binary)
PARTITIONED BY (ds string)
STORED AS AVRO;
See Apache Hive language Docs also here for more examples on Avro and Parquet However to get true performance benefits of Hive with Cost Base optimization and Vectorization you should consider having your Hive tables in the ORC format.
... View more
12-10-2015
03:08 PM
4 Kudos
I believe generally hard coding parallel is a bad idea in your pig script. With Parallel 1, you are effectively having 1 reducer perform the job. This can affect scale and performance. I would allow default parallelism and use the hdfs dfs -getmerge option.
For an input point of view, Here is a tip to Combine Small files.
... View more
12-10-2015
04:50 AM
1 Kudo
Similar. Some tips:
SSL Encryption Support is provided for SSL encryption (Hive 0.13 onward, see HIVE-5351). To enable, set the following configurations in hive-site.xml:
hive.server2.use.SSL – Set this to true. hive.server2.keystore.path – Set this to your keystore path. hive.server2.keystore.password – Set this to your keystore password. In ODBC Driver:
ENABLE SSL must be selected in the ODBC SSL Options window. Ensure the "Allow Common Name Host Name Mismatch" is checked.
... View more
12-10-2015
04:31 AM
1 Kudo
@Divya Gehlot
Since this is scala and not Java, the following should work df.select("name","age").write.format("com.databricks.spark.csv").mode(SaveMode.Append).saveAsTable("PersonHiveTable");
... View more
12-10-2015
03:58 AM
Here is the Apache JIRA that discusses this. It is unresolved at the moment. It discusses why TRANSFORM clause is a security risk and disabled with SQL Authorization in Hive. - Server wide Control over Transform Clause in Hive
... View more
12-10-2015
03:57 AM
Hive CLI is unsecured. You can run any scripts with it. To secure Hive CLI you need to secure the HDFS files directly. See Section 3 in Best Practices for Hive Authorization using Apache Ranger
... View more