About bigdatacm

bigdatacm · ‎08-20-2024

I just downloaded same entity json and just added 'dummy description' under description of entity and tried with same PUT command and I got same error.

bigdatacm · ‎08-20-2024

thanks for getting back to me @shehbazk okay. Entities in Atlas remain in an ACTIVE state, even if they have been deleted from Hive ? The only option is to delete via UI or Rest api calls ?

bigdatacm · ‎08-16-2024

@cravani @pvishnu - could you please advise here ?

bigdatacm · ‎08-16-2024

Hi , I have apache atlas setup in place and hive hooks are up and running. All the hive create/update , lineages etc sowing in atlas. Now I have dropped one hive table from Hive and this same hive table entity is still showing up in apache atlas as ACTIVE. can someone suggest is this something expected or do we need to explicitly delete this entity from Atlas?

bigdatacm · ‎08-12-2024

sure. I will share. It seems this hive staging directory is being created at table level ? I have tried to add partitions and ingest to these partitions . my partitions was (batch name, date). so each spark job write to their respective batch name, However that also failed since hive staging temp dir. is been created at table level not in partition level

bigdatacm · ‎08-12-2024

Hi @ggangadharan , thanks for your replay so basically my multiple spark jobs writes dataframes to same hive table via HWC. each spark job is different set of ingestion/transformations applied and am trying to write to hive table - kind of audit / logs table for each spark job ingestion status , time etc.. these two tables are common across all spark jobs. I am executing the spark job via oozie workflow. Following stack trace i could see : Error creating/checking hive table An error occurred while calling o117.save. : org.apache.spark.SparkException: Writing job aborted. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:146) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:142) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:170) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:167) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:142) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:91) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:704) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:704) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:280) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.RuntimeException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.FileNotFoundException: File hdfs://HDFS-HA/warehouse/tablespace/managed/hive/my_db_name.db/my_log_table_name/.hive-staging_hive_2024-08-09_16-16-49_474_6465304774056330032-46249 does not exist. at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:232) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76) ... 26 more Caused by: java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.FileNotFoundException: File hdfs://HDFS-HA/warehouse/tablespace/managed/hive/my_db_name.db/my_log_table_name/.hive-staging_hive_2024-08-09_16-16-49_474_6465304774056330032-46249 does not exist. at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:411) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:276) at org.apache.hive.jdbc.HivePreparedStatement.execute(HivePreparedStatement.java:101) at com.hortonworks.spark.sql.hive.llap.wrapper.PreparedStatementWrapper.execute(PreparedStatementWrapper.java:48) at com.hortonworks.spark.sql.hive.llap.JDBCWrapper.executeUpdate(HS2JDBCWrapper.scala:396) at com.hortonworks.spark.sql.hive.llap.DefaultJDBCWrapper.executeUpdate(HS2JDBCWrapper.scala) at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.handleWriteWithSaveMode(HiveWarehouseDataSourceWriter.java:345) at com.hortonworks.spark.sql.hive.llap.writers.HiveWarehouseDataSourceWriter.commit(HiveWarehouseDataSourceWriter.java:230 HIVE version - Hive 3.1.3000.7.1.8.55-1 This is the way I am trying to ingest using spark. df.write.mode("append").format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR)\ .option("table", table_name).option("database", atabase_name).save()

bigdatacm · ‎08-10-2024

Hi All, I am trying to write my spark dataframe to hive table using HWC - Hive Warehouse connector. My spark application is in pySpark and i have 5 concurrent spark application running at same time and trying to write to same hive table probably at same time. I am getting following issue Error- caused by: java.lang.RuntimeException: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.FileNotFoundException: File hdfs://tablespace/managed/hive/my_db_name.db/hive_table_name/.hive-staging_hive_2024-08-05_15-55-26_488_5678420092852048777-45678 Does HWC won't allow concurrent write to same hive table? or its limitation with hive table?

bigdatacm · ‎08-07-2024

thank you @DianaTorres . I will wait for their response

bigdatacm · ‎08-06-2024

IHi All, I am getting one issue while updating my existing entity using PUT calls. I have downloaded json by using GET and i just trying to update my entity (i have tried updating description also tried not updating anything and PUT as it is ) using PUT operation and its failing with same reason - "errorCode":"ATLAS-400-00-023","errorMessage":"Attribute null not found for type hive_table" could you please advise how did you update the entity using PUT command ? curl --negotiate -u : -X PUT -H "Content-Type: application/json" https://myURL:portNumber/api/atlas/v2/entity/guid/<myGuid>-d @downloaded_entity.json i have referred this post , however its not showing how to perform PUT operation. Solved: Re: PARTIAL UPDATE APACHE ATLAS ENTITY GUI - Cloudera Community - 297845 so my question is - Does atals REST PAI allows updating entity ? my usecase is i want to update an hive table entity. I have hive table hive column relationship in place. now i want to update my hive-table entity by adding hive columns under attributes: { columns [...]}

bigdatacm · ‎08-06-2024

Thank you for your suggestion @VidyaSargur . will do that

Online	Offline
Last Visited	‎10-09-2024 05:06 AM

Member Since	‎08-05-2024 04:40 AM
Last Visited	‎10-09-2024 05:06 AM
Posts	21
Kudos received	12

Cloudera Community

Re: How to perfrom update (REST API - PUT) operati...

Re: How to make dropped hive table status as inact...

Re: How to perfrom update (REST API - PUT) operati...

How to make dropped hive table status as inactive/...

Re: Hive Warehouse Connector concurrent write to H...

Re: Hive Warehouse Connector concurrent write to H...

Hive Warehouse Connector concurrent write to Hive ...

Re: How to perfrom update (REST API - PUT) operati...

How to perfrom update (REST API - PUT) operation o...

Re: PARTIAL UPDATE APACHE ATLAS ENTITY GUI