About Gomathinayagam

Gomathinayagam · ‎03-09-2020

@ARVINDR , Yes, this is a default behaviour in Hive. When an empty value ( '' ) is passed for STRING datatype, it is taken as it is. If you would like the null values for STRING to be treated as NULL instead of just empty, you can set this property - serialization.null.format for the table of your choice. For example, you can do this on a new table or to an existing table by running a statement like the below, New Table: CREATE TABLE null_test_1 (service1end string, service1start string, service2end string, service2start string, firstlinemaintcost double) TBLPROPERTIES ( 'serialization.null.format'=''); Existing Table: alter table null_test set tblproperties ('serialization.null.format'=''); Once done, the empty values inserted for the columns with STRING datatype will be shown as NULL. Hope this helps!

Gomathinayagam · ‎10-16-2019

Hello, Oracle Data Integrator connects to Hive by using JDBC and uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Source - HERE The points mentioned by you from the documentation is for the purpose of Blocking the external applications and non service users from accessing the Hive metastore. Since, ODI connects to Hive using JDBC, it should connect to HiveServer2 as described in this documentation. Once connected, the query executed from ODI will connect with HiveServer2. Then, HiveServer2 will connect with HiveMetastore for getting the metadata details of the table against which you are querying and proceed with the execution. It is not necessary for ODI to connect to Hive MetaStore directly. For details about Hive Metastore HA, please read HERE

Gomathinayagam · ‎08-05-2019

@smkmuthu If you are using CDH Distribution, you can use HdfsFindTool to accomplish this. Sample Command to find files older than 3 days in the directory "/user/hive" from now: hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /user/hive -type f -mtime -3 Please modify the /opt/cloudera/parcels path in the command as per the version of CDH you are using and the target directory as per the requirement. More details about HdfsFindTool can be found HERE. Hope it helps!

Gomathinayagam · ‎07-30-2019

Hello, Did you try performing this with the help of Hive queries which I think would be possible? 1. CREATE a new empty table with the columns with correct datatypes as per the requirement (meaning the final file's column structure) 2. INSERT data into this new table with a SELECT query with JOIN to join the data from both the views. 3. You will have the files present in the table's HDFS directory. This would be your final file. Thanks!

Gomathinayagam · ‎07-23-2019

Can you check if the short-circuit read is configured as per the Documentation HERE ?

Gomathinayagam · ‎06-03-2019

Hi @cittae No problem. It seems like the error you are facing is the same as it was before. Can you try inserting the values by modifiying the query like below? INSERT INTO ST.X VALUES ('123456789-xyz', CAST(350000 AS DECIMAL(16,2))), ('321564987-zyx', CAST(7896163500 AS DECIMAL(16,2))) The INSERT query you used has a casting missing for the second record.

Gomathinayagam · ‎05-31-2019

Hello @cittae This is an expected event. In Impala, by default - If you specify a value such as 7896163500 to go into a DECIMAL column, Impala checks if the column has enough precision to represent the largest value of that integer type, and raises an error if not. Here 7896163500 is a BIGINT and BIGINT's maximum storage is 8 Bytes. Since we have 2 fractional digits here, you see the below error, Expression '7896163500' (type: BIGINT) would need to be cast to DECIMAL(20,2) for column 'req_amount' Therefore, use an expression like (cast(7896163500 as BIGINT) as DECIMAL(12,2)) for DECIMAL columns as you have already done to insert the details. Hope this helps! For more: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_decimal.html Thanks!

Gomathinayagam · ‎04-10-2019

If I understand your situation correctly, you have a table backed by 10 TB of data in HDFS with partition on Column A and you want to add the partition also on Column B. So, if Column B is going to be the sub partition, the HDFS directory would look like user/hive/warehouse/database/table/colA/colB or /colB/colA otherwise (considering it as an managed table). Restructuring the HDFS directory manually won't be a great idea because it will become a nightmare to scan the data on all files and organize it accordingly in its corresponding folder. Below is one way of doing it, 1. Create a new table with new structure - i.e., with partitions on Col A and Col B. CREATE TABLE NEWTABLE ( COLUMNS ... ) PARTITON ON ( COL_A INT, COL_B INT ) 2.a. Insert data from the old table to the new table (created in Step #1) like below, INSERT INTO NEWTABLE SELECT * FROM OLDTABLE But Yes, this step is going to consume a lot of resources during execution if not handled properly, space in HDFS for storing the results as data for NEWTABLE and of-course the time. OR 2.b. If you think that HDFS will not have enough space to hold all the data or resource crunch, I would suggest you to this INSERT in batches with removal of old data after each INSERT operations. INSERT INTO NEWTABLE SELECT * FROM OLDTABLE WHERE COL_A='abc' DELETE FROM OLDTABLE WHERE COL_A='abc' INSERT INTO NEWTABLE SELECT * FROM OLDTABLE WHERE COL_A='def' DELETE FROM OLDTABLE WHERE COL_A='def' . . . so on. This way, you can unload HDFS with already handled data and balancing the space. If you follow step 2.b. then you can write a Script to automate this process by passing the partition names (derived from SHOW PARTITIONS) dynamically for each run. But, try the first two attempts manually before going with automation to make sure things go as expected.

Gomathinayagam · ‎04-03-2019

Impala currently does not support Timestamp in AVRO format, please refer to Impala upstream JIRA: IMPALA-1871. Although the upstream Hive does support Timestamp in AVRO via JIRA: HIVE-8131, in order to keep the compatibility between Hive and Impala, we decided to remove this feature from Hive until Impala supports such functionality.

Gomathinayagam · ‎04-01-2019

Hi, The second command mentioned by you (as below) lists only the directories and does not include its sub directories. hadoop fs -ls -S -h <path_to_hadoop_folder> Can you double check and share the result you see? Thanks.

Online	Offline
Last Visited	‎03-21-2023 08:52 AM

Member Since	‎08-16-2018 05:12 AM
Last Visited	‎03-21-2023 08:52 AM
Posts	55
Kudos received	4

Cloudera Community

Re: Need Clarification for Hive Data Server Config...

Re: Retrive hdfs files after a specific time stam...

Re: Hive Insert Query

Re: Need Clarification for Hive Data Server Config...

Re: Retrive hdfs files after a specific time stam...

Re: Hadoop tools/technology and design recommendat...

Re: Impalad COORDINATOR_ONLY with HDFS Short-Circu...

Re: impala-jdbc: Unexpected cast error from BigDec...

Re: impala-jdbc: Unexpected cast error from BigDec...

Re: Hive 10TB table restructuring to add new parti...

Re: Support timestamp in Avro

Re: List hdfs directories and sort based on direct...