About amcbarnett

amcbarnett · ‎12-28-2015

See the following Question How to Write Cluster Startup and Shutdown Scripts with Ambari

amcbarnett · ‎12-16-2015

No I don't think it's possible. Admin Authentication is only for one cluster. We would need to specify when we authenticate which cluster we are authenticating for. Security Policies and configuration is on a cluster basis in the Ranger Admin database.

amcbarnett · ‎12-16-2015

Post truncated as the messages were repeated...

amcbarnett · ‎12-11-2015

For high throughput use cases, Solr (actually Solr in Cloud mode) should run on separate nodes. However for HDFS based indexes you may get slight performance degradation. You can colocate Solr with the Datanodes but you sacrifice latency. So since you are running Spark jobs also, I would recommend SolrCloud on a couple more nodes

amcbarnett · ‎12-10-2015

@Mehdi TAZI Avro CREATE EXTERNAL TABLE table_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' TBLPROPERTIES ('avro.schema.url'='hdfs://user/schemas/table_avro.avsc') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/table/table_avro'; From Hive 0.14 and Up (Easier DDL) CREATE TABLE kst ( string1 string, string2 string, int1 int, boolean1 boolean, long1 bigint, float1 float, double1 double, inner_record1 struct, enum1 string, array1 array, map1 map, union1 uniontype, fixed1 binary, null1 void, unionnullint int, bytes1 binary) PARTITIONED BY (ds string) STORED AS AVRO; See Apache Hive language Docs also here for more examples on Avro and Parquet However to get true performance benefits of Hive with Cost Base optimization and Vectorization you should consider having your Hive tables in the ORC format.

amcbarnett · ‎12-10-2015

I believe generally hard coding parallel is a bad idea in your pig script. With Parallel 1, you are effectively having 1 reducer perform the job. This can affect scale and performance. I would allow default parallelism and use the hdfs dfs -getmerge option. For an input point of view, Here is a tip to Combine Small files.

amcbarnett · ‎12-10-2015

Similar. Some tips: SSL Encryption Support is provided for SSL encryption (Hive 0.13 onward, see HIVE-5351). To enable, set the following configurations in hive-site.xml: hive.server2.use.SSL – Set this to true. hive.server2.keystore.path – Set this to your keystore path. hive.server2.keystore.password – Set this to your keystore password. In ODBC Driver: ENABLE SSL must be selected in the ODBC SSL Options window. Ensure the "Allow Common Name Host Name Mismatch" is checked.

amcbarnett · ‎12-10-2015

@Divya Gehlot Since this is scala and not Java, the following should work df.select("name","age").write.format("com.databricks.spark.csv").mode(SaveMode.Append).saveAsTable("PersonHiveTable");

amcbarnett · ‎12-10-2015

Here is the Apache JIRA that discusses this. It is unresolved at the moment. It discusses why TRANSFORM clause is a security risk and disabled with SQL Authorization in Hive. - Server wide Control over Transform Clause in Hive

amcbarnett · ‎12-10-2015

Hive CLI is unsecured. You can run any scripts with it. To secure Hive CLI you need to secure the HDFS files directly. See Section 3 in Best Practices for Hive Authorization using Apache Ranger

Online	Offline
Last Visited	‎04-13-2018 03:07 PM

Member Since	‎09-29-2015 05:35 PM
Last Visited	‎04-13-2018 03:07 PM
Posts	286
Kudos received	595

Cloudera Community

Re: HIVE : counting null values based on group by

Re: ERROR 500 received - when installing the PIVOT...

Re: How do you achieve high availability in HDFS w...

Re: Why can't we use LDAP for Hadoop authenticatio...

Re: Error Installing HDB HAWQ Standby Master

Re: How can I stop the entire HDP stack via the CL...

Re: Is it possible to manage multiple clusters usi...

Re: I have Hortonworks sandbox on azure . HBase cr...

Re: Solr architecture for a production environment

Re: Create Hive table to read parquet files from p...

Re: Best practice for extract/output data (generat...

Re: ODBC access via Knox to HiveServer2 with Hive ...

Re: getting error while persisting spark output to...

Re: Query with transform clause is disallowed in c...

Re: Query with transform clause is disallowed in c...