Member since
03-06-2017
11
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
20138 | 10-22-2018 02:44 PM |
10-02-2019
06:18 AM
@lvazquez maybe you can directly execute a "kinit" to submit your user's credentials to your LDAP I manage to authenticate users from AD while the cluster is kerberorized through a FreeIPA Server. This is a command sample: %sh
echo "password" | kinit foo@hortonworks.local
hdfs dfs -ls /
Found 12 items
drwxrwxrwt - yarn hadoop 0 2019-10-02 13:53 /app-logs
drwxr-xr-x - hdfs hdfs 0 2019-10-01 15:27 /apps
drwxr-xr-x - yarn hadoop 0 2019-10-01 14:06 /ats
drwxr-xr-x - hdfs hdfs 0 2019-10-01 14:08 /atsv2
drwxr-xr-x - hdfs hdfs 0 2019-10-01 14:06 /hdp
drwx------ - livy hdfs 0 2019-10-02 11:35 /livy2-recovery
drwxr-xr-x - mapred hdfs 0 2019-10-01 14:06 /mapred
drwxrwxrwx - mapred hadoop 0 2019-10-01 14:08 /mr-history
drwxrwxrwx - spark hadoop 0 2019-10-02 15:08 /spark2-history
drwxrwxrwx - hdfs hdfs 0 2019-10-01 15:31 /tmp
drwxr-xr-x - hdfs hdfs 0 2019-10-02 14:23 /user
drwxr-xr-x - hdfs hdfs 0 2019-10-01 15:14 /warehouse I think this way is really ugly but at least, it is possible. Do not forget to change in your hdfs-site file the auth_to_local RULE:[1:$1@$0](.*@HORTONWORKS.LOCAL)s/@.*//
RULE:[1:$1@$0](.*@IPA.HORTONWORKS.LOCAL)s/@.*//
... View more
04-24-2019
08:42 AM
I manage to retrieve the group named "ad_sshaccess_users" from the LDAP directory to the Ambari. But there is "0 member" inside this group. But in the Active Directory I created 2 users under this group mapped in the FreeIPA. Do you know if Ambari can retrieve AD users through a FreeIPA server which is doing the LDAP part? I'm not sure about that.
... View more
10-22-2018
02:44 PM
1 Kudo
A solution to import your data as parquet file and be able to treat the TIMESTAMP and DATE format which come from RDBMS such as IBM DB2 or MySQL is to import using the sqoop import --as-parquet command and map each field using --map-column-java which are TIMESTAMP and DATE to a String Java type. After that, you should be able to interrogate the Hive database though a SparkSession by changing the configuration of the actual Spark Session and set spark.sql.hive.convertMetastoreParquet to false. SparkSQL will use the Hive SerDe for reading parquet tables instead of the built in support. spark.sql.hive.convertMetastoreParquet false import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder()
.appName("test interrogate Hive parquet file using Spark")
.config("spark.sql.parquet.compression.codec", "snappy")
.config("spark.sql.warehouse.dir","/apps/hive/warehouse")
.config("hive.metastore.uris","thrift://sdsl-hdp-01.mycluster:9083")
.config("spark.sql.hive.convertMetastoreParquet", false)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
import spark.sql
val df = sql("SELECT CAST(COL1 AS TIMESTAMP), COL2, COL3, CAST(COL4 AS TIMESTAMP), COL5 FROM db.mytable")
df.printSchema
root
|-- COL1: timestamp (nullable = true)
|-- COL2: string (nullable = true)
|-- COL3: string (nullable = true)
|-- COL4: timestamp (nullable = true)
|-- COL5: integer (nullable = true)
df.show(5, false)
+--------------------------+--------+--------+--------------------------+------+
|COL1 |COL2 |COL3 |COL4 |COL5|
+--------------------------+--------+--------+--------------------------+------+
|2003-01-01 00:00:00.100001| |00001 |2003-01-01 00:00:00.10361 |1 |
|2003-01-01 00:00:00.100002| |00002 |2003-01-01 00:00:00.100002|2 |
|2003-01-01 00:00:00.100003| |00003 |2003-01-01 00:00:00.100003|3 |
|2003-01-01 00:00:00.100004| |00004 |2003-01-01 00:00:00.100004|4 |
|2003-01-01 00:00:00.100005| |00005 |2003-01-01 00:00:00.100005|5 |
+--------------------------+--------+--------+--------------------------+------+
only showing top 5 row
... View more