About marcel-jan

MarkLa · ‎07-30-2024

..for those who are interested...i figured out a solution...earlier i tried to use a "Classification" and use a Hyperlink for an Attribute there, and this does not work. But if you use "Business Metadata" and assign them to your data asset, it allows to define an Attribute with a Hyperlink...

marcel-jan · ‎05-16-2024

Because I ran into this thread when looking how to solve this error and because we found a solution, I thought it might still serve some people if I share what solution we found. We needed HWC to profile Hive managed + transactional tables from Ataccama (data quality solution). And we found someone who successfully got spark-submit working. We checked their settings and changed the spark-submit as follows: COMMAND="$SPARK_HOME/bin/$SPARK_SUBMIT \ --files $MYDIR/$LOG4J_FILE_NAME $SPARK_DRIVER_JAVA_OPTS $SPARK_DRIVER_OPTS \ --jars {{ hwc_jar_path }} \ --conf spark.security.credentials.hiveserver2.enabled=false \ --conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@{{ ad_realm }}" \ --conf spark.dynamicAllocation.enable=false \ --conf spark.hadoop.metastore.catalog.default=hive \ --conf spark.yarn.maxAppAttempts=1 \ --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED \ --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.timeParserPolicy=LEGACY \ --conf spark.sql.legacy.typeCoercion.datetimeToString.enabled=true \ --conf spark.sql.parquet.int96TimestampConversion=true \ --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions \ --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension \ --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \ --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol \ --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \ --class $CLASS $JARS $MYLIB $PROPF $LAUNCH $*"; exec $COMMAND Probably the difference was in the spark.hadoop.metastore.catalog.default=hive setting. In the above example are some Ansible variables: hwc_jar_path: "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-141.jar" ad_realm is our LDAP realm. Hope it helps anyone.

marcel-jan · ‎02-07-2019

This is now the winning REST API query: curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_table+where+qualifiedName%3D%27testdb.mytable.id@CLUSTERNAME%27" It gives a list of all columns for a table, including deleted ones. In my Python code I pick the column with ACTIVE state.

Shelton · ‎02-01-2019

@Sandeep Nemuri I think we responded at almost the same time, when some is clicking submit, there is no logic that checks whether a similar answer has already been give 🙂 Maybe you should have added that he needs to run the script as Atlas admin user as illustrated which he wasn't aware of 🙂

ricardo_larrana · ‎10-09-2018

Yes, you are right, That is what i meant. I haven't played with the sandbox, but the key would be to make sure that the user NiFi uses to run has access and permissions to the resources you are adding to the processor. Executestreamprocessor will have the same issue if the path is wrong or non existent in the server. I would first try to find out what user is used to run nifi (ps -ef would be your friend here). Then i would make sure that that user has access to the path in the console. (Use 'ls -l path') from the home directory. Path in this case would be both the path of the executable, and the path of the working directory (Make sure both are accessible) Lastly, try to execute your script from the command line. Thanks! Regards

JonathanSneep · ‎09-06-2018

Awesome, @Marcel-Jan Krijgsman, glad we got it working 🙂 and thank you for sharing the trimmed result!

marcel-jan · ‎08-07-2018

@Felix Albani Thanks for that answer. Looks like I stand for an interesting choice: Change hive.server2.enable.doAs=true and run Hive on HDFS as HiveServer2 process. But then I can restrict access to columns to users in Hive, without them getting access to the HDFS files. So the choice of Hive permissions I make will be much more important. Keep hive.server2.enable.doAs=false and I will not be able to do column based access in Hive. But be in the comfort that if someone gets access to Hive table without the HDFS access, they still can not get to the data. I'll have to think about this.

marcel-jan · ‎02-28-2018

It seems these messages only occur on my sandbox environment. On my customers HDP 2.6.3 environment I haven't seen any ATLAS-500-00-007 errors yet.

asirna · ‎12-14-2017

I tried this and it returns only the queried table curl -X GET \ 'http://sandbox.hortonworks.com:21000/api/atlas/v2/search/dsl?typeName=hive_table&query=where%20name%3D%22asteroids%22' \ -H 'authorization: Basic YWRtaW46YWRtaW4=' Since the thread was long. I put the correct answer separately. You will find an "Accept" button beside this answer. Please click on it to accept it to make it as Best answer. Thanks a lot.

Online	Offline
Last Visited	‎05-16-2024 09:09 AM

Member Since	‎03-01-2017 10:37 AM
Last Visited	‎05-16-2024 09:09 AM
Posts	62
Kudos received	7

Cloudera Community

Re: Atlas REST API: get the guid of a Hive column

Re: Hyperlinks from Atlas attributes?

Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.ta...

Re: Atlas REST API: get the guid of a Hive column

Re: Discovering existing Hive tables in Atlas

Re: Running a Python script from NiFi

Re: How to create Ranger policy on HDFS path via R...

Re: Ranger: is it possible to grant access to a se...

Re: Atlas entity bulk create results in ATLAS-500-...

Re: Atlas REST API search of a table fails