About Gomathinayagam

ARVINDR · ‎03-09-2020

@Gomathinayagam Thanks for your prompt response & clarification !.

Gomathinayagam · ‎10-16-2019

Hello, Oracle Data Integrator connects to Hive by using JDBC and uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Source - HERE The points mentioned by you from the documentation is for the purpose of Blocking the external applications and non service users from accessing the Hive metastore. Since, ODI connects to Hive using JDBC, it should connect to HiveServer2 as described in this documentation. Once connected, the query executed from ODI will connect with HiveServer2. Then, HiveServer2 will connect with HiveMetastore for getting the metadata details of the table against which you are querying and proceed with the execution. It is not necessary for ODI to connect to Hive MetaStore directly. For details about Hive Metastore HA, please read HERE

Tomas79 · ‎08-06-2019

You can use a script like this to create snapshots of old and new files - i.e. search files which are older than 3 days and search for files which are newer than 3 days, just make sure, you use the correct path to the cloudera jars. In the case of CDH5.15: #!/bin/bash now=`date +"%Y-%m-%dT%H:%M:%S"` hdfs dfs -rm /data/cleanup_report/part=older3days/* hdfs dfs -rm /data/cleanup_report/part=newer3days/* hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /data -type d -mtime +3 | sed "s/^/${now}\tolder3days\t/" | hadoop fs -put - /data/cleanup_report/part=older3days/data.csv hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /data -type d -mtime -3 | sed "s/^/${now}\tnewer3days\t/" | hadoop fs -put - /data/cleanup_report/part=newer3days/data.csv Then create an external table with partitions on top of this HDFS folder.

iamsaaj · ‎08-06-2019

Thanks for the reply ! Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue. Here are the two approach i came up with 1. Extract data from Hive view into files. 2. Create intermediate Hive tables and load data extracted from views. 3. Join the new hive tables to generate the final file. Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.

anandnalya · ‎07-25-2019

Got it working on my cluster. Thanks Lars and Eric. Cheers, Anand

cittae · ‎06-03-2019

I agree with you. The stacktrace shows that the sql misses the cast for the second record. Unfortunately I do not have any control on the effective query, since it is done via JDBC. This is a code replicating the issue: String sql = "INSERT INTO ST.X VALUES (?, CAST(? AS DECIMAL(16,2)))"; try (PreparedStatement ps = connection.prepareStatement(sql)) { ps.setString(1, id1); ps.setBigDecimal(2, amount1); ps.addBatch(); ps.setString(1, id2); ps.setBigDecimal(2, amount2); ps.addBatch(); ps.executeBatch(); } When I execute the same code by adding the batch just once, it works correctly. It seems like the driver "cleans up" the query, removing the cast: INSERT INTO ST.X VALUES ('123456789-xyz', CAST(350000 AS DECIMAL(16,2))) , ('321564987-zyx',7896163500)

NVidak · ‎05-07-2019

The same workaround worked for me too - also getting null when selecting from_unixtime(starttime). starttime is bigint. Is this a bug in Impala or ... ? Also, in hive the following query normally works: SELECT cast(starttime as TIMESTAMP) from dynatracelogs ORDER BY starttime desc LIMIT 100 but in Impala it returns nulls. ....

EricL · ‎04-12-2019

Hi, On top of what Gomath has suggested, I also have another idea: 1. assuming that your current data on HDFS is /user/hive/wareshouse/dbname/tablename/columnA=1 /user/hive/wareshouse/dbname/tablename/columnA=2 /user/hive/wareshouse/dbname/tablename/columnA=3 /user/hive/wareshouse/dbname/tablename/columnA=4 .... and you want it to become: /user/hive/wareshouse/dbname/tablename/columnA=1/columnB=1 /user/hive/wareshouse/dbname/tablename/columnA=2/columnB=1 /user/hive/wareshouse/dbname/tablename/columnA=3//columnB=1 /user/hive/wareshouse/dbname/tablename/columnA=4/columnB=1 ... 2. create a new table that has the same columns, but with extra partition columnB, assuming that you will create under HDFS in /user/hive/wareshouse/dbname/new_tablename 3. you can create a new directory with below structure: /user/hive/wareshouse/dbname/new_tablename/columnA=1 /user/hive/wareshouse/dbname/new_tablename/columnA=2 /user/hive/wareshouse/dbname/new_tablename/columnA=3 /user/hive/wareshouse/dbname/new_tablename/columnA=4 ... 4. create the top level partition directories for the new table hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=1 hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=2 hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=3 hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=4 5. and then MOVE data from old table into new table's partition: hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=1 /user/hive/wareshouse/dbname/new_tablename/columnA=1/columnB=1 hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=2 /user/hive/wareshouse/dbname/new_tablename/columnA=2/columnB=1 hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=3 /user/hive/wareshouse/dbname/new_tablename/columnA=3/columnB=1 hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=4 /user/hive/wareshouse/dbname/new_tablename/columnA=4/columnB=1 .... What's the value for columnB you will decide, since it is a new partition. 6. get back to hive and then run below command: MSCK REPAIR TABLE dbname.new_tablename; This will help to recover all the missing partitions 7. Run: SELECT * FROM dbname.new_tablename to confirm if data is correct. This way, you do not need to duplicate any data on HDFS. Hope above helps.

Gomathinayagam · ‎04-03-2019

Impala currently does not support Timestamp in AVRO format, please refer to Impala upstream JIRA: IMPALA-1871. Although the upstream Hive does support Timestamp in AVRO via JIRA: HIVE-8131, in order to keep the compatibility between Hive and Impala, we decided to remove this feature from Hive until Impala supports such functionality.

Gomathinayagam · ‎04-01-2019

Hi, The second command mentioned by you (as below) lists only the directories and does not include its sub directories. hadoop fs -ls -S -h <path_to_hadoop_folder> Can you double check and share the result you see? Thanks.

Online	Offline
Last Visited	‎03-21-2023 08:52 AM

Member Since	‎08-16-2018 05:12 AM
Last Visited	‎03-21-2023 08:52 AM
Posts	55
Kudos received	4

Cloudera Community

Re: Need Clarification for Hive Data Server Config...

Re: Retrive hdfs files after a specific time stam...

Re: Hive Insert Query

Re: Need Clarification for Hive Data Server Config...

Re: Retrive hdfs files after a specific time stam...

Re: Hadoop tools/technology and design recommendat...

Re: Impalad COORDINATOR_ONLY with HDFS Short-Circu...

Re: impala-jdbc: Unexpected cast error from BigDec...

Re: Impala - Convert bigint to timestamp

Re: Hive 10TB table restructuring to add new parti...

Re: Support timestamp in Avro

Re: List hdfs directories and sort based on direct...