Member since
08-16-2018
55
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
804 | 10-16-2019 08:18 AM | |
4971 | 08-05-2019 06:38 AM |
03-09-2020
05:24 AM
@Gomathinayagam Thanks for your prompt response & clarification !.
... View more
10-16-2019
08:18 AM
1 Kudo
Hello, Oracle Data Integrator connects to Hive by using JDBC and uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Source - HERE The points mentioned by you from the documentation is for the purpose of Blocking the external applications and non service users from accessing the Hive metastore. Since, ODI connects to Hive using JDBC, it should connect to HiveServer2 as described in this documentation. Once connected, the query executed from ODI will connect with HiveServer2. Then, HiveServer2 will connect with HiveMetastore for getting the metadata details of the table against which you are querying and proceed with the execution. It is not necessary for ODI to connect to Hive MetaStore directly. For details about Hive Metastore HA, please read HERE
... View more
08-06-2019
08:23 AM
You can use a script like this to create snapshots of old and new files - i.e. search files which are older than 3 days and search for files which are newer than 3 days, just make sure, you use the correct path to the cloudera jars. In the case of CDH5.15: #!/bin/bash
now=`date +"%Y-%m-%dT%H:%M:%S"`
hdfs dfs -rm /data/cleanup_report/part=older3days/*
hdfs dfs -rm /data/cleanup_report/part=newer3days/*
hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /data -type d -mtime +3 | sed "s/^/${now}\tolder3days\t/" | hadoop fs -put - /data/cleanup_report/part=older3days/data.csv
hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /data -type d -mtime -3 | sed "s/^/${now}\tnewer3days\t/" | hadoop fs -put - /data/cleanup_report/part=newer3days/data.csv Then create an external table with partitions on top of this HDFS folder.
... View more
08-06-2019
06:05 AM
Thanks for the reply ! Views are already created by joining many underlying table, hence joining the views again for data aggregation will result performance issue. Here are the two approach i came up with 1. Extract data from Hive view into files. 2. Create intermediate Hive tables and load data extracted from views. 3. Join the new hive tables to generate the final file. Another approach to use PySpark to read data from views directly , aggreate and transform the data and generate the final output file.
... View more
07-25-2019
03:28 AM
Got it working on my cluster. Thanks Lars and Eric. Cheers, Anand
... View more
06-03-2019
05:48 AM
I agree with you. The stacktrace shows that the sql misses the cast for the second record. Unfortunately I do not have any control on the effective query, since it is done via JDBC. This is a code replicating the issue: String sql = "INSERT INTO ST.X VALUES (?, CAST(? AS DECIMAL(16,2)))";
try (PreparedStatement ps = connection.prepareStatement(sql)) {
ps.setString(1, id1);
ps.setBigDecimal(2, amount1);
ps.addBatch();
ps.setString(1, id2);
ps.setBigDecimal(2, amount2);
ps.addBatch();
ps.executeBatch();
} When I execute the same code by adding the batch just once, it works correctly. It seems like the driver "cleans up" the query, removing the cast: INSERT INTO ST.X VALUES
('123456789-xyz', CAST(350000 AS DECIMAL(16,2))) ,
('321564987-zyx',7896163500)
... View more
05-07-2019
03:58 AM
The same workaround worked for me too - also getting null when selecting from_unixtime(starttime). starttime is bigint. Is this a bug in Impala or ... ? Also, in hive the following query normally works: SELECT cast(starttime as TIMESTAMP) from dynatracelogs ORDER BY starttime desc LIMIT 100 but in Impala it returns nulls. ....
... View more
04-12-2019
08:08 PM
Hi, On top of what Gomath has suggested, I also have another idea: 1. assuming that your current data on HDFS is /user/hive/wareshouse/dbname/tablename/columnA=1 /user/hive/wareshouse/dbname/tablename/columnA=2 /user/hive/wareshouse/dbname/tablename/columnA=3 /user/hive/wareshouse/dbname/tablename/columnA=4 .... and you want it to become: /user/hive/wareshouse/dbname/tablename/columnA=1/columnB=1 /user/hive/wareshouse/dbname/tablename/columnA=2/columnB=1 /user/hive/wareshouse/dbname/tablename/columnA=3//columnB=1 /user/hive/wareshouse/dbname/tablename/columnA=4/columnB=1 ... 2. create a new table that has the same columns, but with extra partition columnB, assuming that you will create under HDFS in /user/hive/wareshouse/dbname/new_tablename 3. you can create a new directory with below structure: /user/hive/wareshouse/dbname/new_tablename/columnA=1 /user/hive/wareshouse/dbname/new_tablename/columnA=2 /user/hive/wareshouse/dbname/new_tablename/columnA=3 /user/hive/wareshouse/dbname/new_tablename/columnA=4 ... 4. create the top level partition directories for the new table hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=1 hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=2 hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=3 hadoop fs -mkdir /user/hive/wareshouse/dbname/new_tablename/columnA=4 5. and then MOVE data from old table into new table's partition: hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=1 /user/hive/wareshouse/dbname/new_tablename/columnA=1/columnB=1 hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=2 /user/hive/wareshouse/dbname/new_tablename/columnA=2/columnB=1 hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=3 /user/hive/wareshouse/dbname/new_tablename/columnA=3/columnB=1 hadoop fs -mv /user/hive/wareshouse/dbname/tablename/columnA=4 /user/hive/wareshouse/dbname/new_tablename/columnA=4/columnB=1 .... What's the value for columnB you will decide, since it is a new partition. 6. get back to hive and then run below command: MSCK REPAIR TABLE dbname.new_tablename; This will help to recover all the missing partitions 7. Run: SELECT * FROM dbname.new_tablename to confirm if data is correct. This way, you do not need to duplicate any data on HDFS. Hope above helps.
... View more
04-03-2019
04:03 AM
Impala currently does not support Timestamp in AVRO format, please refer to Impala upstream JIRA: IMPALA-1871. Although the upstream Hive does support Timestamp in AVRO via JIRA: HIVE-8131, in order to keep the compatibility between Hive and Impala, we decided to remove this feature from Hive until Impala supports such functionality.
... View more
04-01-2019
01:58 PM
Hi, The second command mentioned by you (as below) lists only the directories and does not include its sub directories. hadoop fs -ls -S -h <path_to_hadoop_folder> Can you double check and share the result you see? Thanks.
... View more