Member since
07-12-2017
7
Posts
0
Kudos Received
0
Solutions
02-05-2018
02:46 PM
The HCatOutputFormat class is in jar hive-hcatalog-core.jar . try to export the jar to HADOOP_CLASSPATH take look to this link for more information https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput
... View more
01-19-2018
01:19 PM
thx berry , I'am doing this actually , but i was wondering if there is other "clean" solutions .
... View more
01-19-2018
10:31 AM
Hello everyone , I have to update and maybe delete some rows in existing hive tables ( orc ) , unfortunately the update and delete operations are blocked by the datalake management team . So , i would like to know if there is any way to do the update without direct acid operations thx
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
10-23-2017
01:08 PM
hello @kgautam , Thx for you answer . the validation and transformation consist in file name validation , column validation , datatype .. adding some additional columns ( update date , expiration ...) , If all the checks
are OK, then load the file in the ORC table. There is some rules and transformation when is a creation , update or delete . ( expiration and data validation process ) Also i need to use use parameters
to enable or disable
one or many checks, to determine how
many errors are allowed before stop the process ... The ingestion is normally at the End of the day . Thx
... View more
10-23-2017
12:57 PM
Hi @Abdelkrim Hadjidj, Thank you for the answer . we got certain constraints related to the environment and to the team that manages the datalake. Apache nifi is unfortunately not supported. the solutions proposed revolved around java/spark and oozie. thx again , Réda
... View more
10-23-2017
12:43 PM
Hello , I want to load data from csv file from different Datasources in HDFS to ORC table , including some data validations ( business rules ...) and transformation ... . Actually , my general process is to load all csv files with the same structure from the same datasource to one unique external table , then processing and applying the validation rules and the data transformation , then load the data to orc table . My question is , what is the best way to automate the process (loading , validation and transformation ) to make scheduling and monitoring easy and also all the CRUD problems . PS : i started working with java spark and oozie for scheduling thx
... View more
Labels:
- Labels:
-
Apache Spark