About james_bashforth

Matrix · ‎10-22-2020

I did as @ssubhas said, setting the attributes to false. spark.sql("SET hive.enforce.bucketing=false") spark.sql("SET hive.enforce.sorting=false") spark.sql("SET spark.hadoop.hive.exec.dynamic.partition = true") spark.sql("SET spark.hadoop.hive.exec.dynamic.partition.mode = nonstrict") newPartitionsDF.write.mode(SaveMode.Append).format("hive").insertInto(this.destinationDBdotTableName) Spark can create the bucketed table in Hive with no issues. Spark inserted the data into the table, but it totally ignored the fact that the table is bucketed. So when I open a partition, I see only 1 file. When inserting, we should set hive.enforce.bucketing = true, not false. And you will face the following error in Spark logs. org.apache.spark.sql.AnalysisException: Output Hive table `hive_test_db`.`test_bucketing` is bucketed but Spark currently does NOT populate bucketed output which is compatible with Hive.; This means that Spark doesn't support insertion into bucketed Hive tables. The first answer in this Stackoverflow question, explains that what @ssubhas suggested is a workaround that doesn't guarantee bucketing.

JonathanSneep · ‎08-03-2018

Awesome, glad you got it working now and thanks for clarifying how you got it up! 🙂

rlevas · ‎08-03-2018

There is a workaround for this issue. However the results may not be desired since the CN will be a set of seemingly random characters. The CN is set using the value calculated using the Velocity template specified in the kerberos-env/ad_create_attributes_template configuration. The default value of the template is { "objectClass": ["top", "person", "organizationalPerson", "user"], "cn": "$principal_name", #if( $is_service ) "servicePrincipalName": "$principal_name", #end "userPrincipalName": "$normalized_principal", "unicodePwd": "$password", "accountExpires": "0", "userAccountControl": "66048" } As you can see, the CN value is set to the identity's principal name. This can be changed, but we need to make sure the value will be unique. There are several variables available to use in this template. See https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.5/bk_ambari-security/content/customizing_the_attribute_template.html. You can use one of the hashes to limit the size of the value and provide a reasonable probability of uniqueness: principal_digest (SHA1) - 40 characters principal_digest_256 (SHA256) - 64 characters principal_digest_512 (SHA512) - 128 characters Since the maximum length for the CN attribute in an Active Directory is 64 characters, I would suggest using principal_digest_256. For example, { "objectClass": ["top", "person", "organizationalPerson", "user"], "cn": "$principal_digest_256", #if( $is_service ) "servicePrincipalName": "$principal_name", #end "userPrincipalName": "$normalized_principal", "unicodePwd": "$password", "accountExpires": "0", "userAccountControl": "66048" } Notice the "cn" line was changed from "cn": "$principal_name" to "cn": "$principal_digest_256". You can change this templet from the Enable Kerberos Wizard if you open the Advanced kerberos-env tab on the Configure Kerberos page and look for the Account Attribute Template property.

Online	Offline
Last Visited	‎08-22-2018 01:46 PM

Member Since	‎02-26-2018 11:54 AM
Last Visited	‎08-22-2018 01:46 PM
Posts	15

Cloudera Community

Re: Hive bucketed table from Spark 2.3

Re: Cloudbreak - kerberos-env json descriptor form...

Re: Cloubdreak on Azure Kerberos configuration hos...