Member since
08-16-2018
55
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1207 | 10-16-2019 08:18 AM | |
6391 | 08-05-2019 06:38 AM |
03-09-2020
02:41 AM
@ARVINDR , Yes, this is a default behaviour in Hive. When an empty value ( '' ) is passed for STRING datatype, it is taken as it is. If you would like the null values for STRING to be treated as NULL instead of just empty, you can set this property - serialization.null.format for the table of your choice. For example, you can do this on a new table or to an existing table by running a statement like the below, New Table: CREATE TABLE null_test_1
(service1end string,
service1start string,
service2end string,
service2start string,
firstlinemaintcost double)
TBLPROPERTIES (
'serialization.null.format'=''); Existing Table: alter table null_test set tblproperties ('serialization.null.format'=''); Once done, the empty values inserted for the columns with STRING datatype will be shown as NULL. Hope this helps!
... View more
10-16-2019
08:18 AM
1 Kudo
Hello, Oracle Data Integrator connects to Hive by using JDBC and uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Source - HERE The points mentioned by you from the documentation is for the purpose of Blocking the external applications and non service users from accessing the Hive metastore. Since, ODI connects to Hive using JDBC, it should connect to HiveServer2 as described in this documentation. Once connected, the query executed from ODI will connect with HiveServer2. Then, HiveServer2 will connect with HiveMetastore for getting the metadata details of the table against which you are querying and proceed with the execution. It is not necessary for ODI to connect to Hive MetaStore directly. For details about Hive Metastore HA, please read HERE
... View more
08-05-2019
06:38 AM
@smkmuthu If you are using CDH Distribution, you can use HdfsFindTool to accomplish this. Sample Command to find files older than 3 days in the directory "/user/hive" from now: hadoop jar /opt/cloudera/parcels/CDH/jars/search-mr-1.0.0-cdh5.15.1.jar org.apache.solr.hadoop.HdfsFindTool -find /user/hive -type f -mtime -3 Please modify the /opt/cloudera/parcels path in the command as per the version of CDH you are using and the target directory as per the requirement. More details about HdfsFindTool can be found HERE. Hope it helps!
... View more
07-30-2019
03:45 AM
Hello, Did you try performing this with the help of Hive queries which I think would be possible? 1. CREATE a new empty table with the columns with correct datatypes as per the requirement (meaning the final file's column structure) 2. INSERT data into this new table with a SELECT query with JOIN to join the data from both the views. 3. You will have the files present in the table's HDFS directory. This would be your final file. Thanks!
... View more
07-23-2019
10:01 PM
Can you check if the short-circuit read is configured as per the Documentation HERE ?
... View more
06-03-2019
05:02 AM
Hi @cittae No problem. It seems like the error you are facing is the same as it was before. Can you try inserting the values by modifiying the query like below? INSERT INTO ST.X VALUES
('123456789-xyz', CAST(350000 AS DECIMAL(16,2))), ('321564987-zyx', CAST(7896163500 AS DECIMAL(16,2))) The INSERT query you used has a casting missing for the second record.
... View more
05-31-2019
08:01 AM
Hello @cittae This is an expected event. In Impala, by default - If you specify a value such as 7896163500 to go into a DECIMAL column, Impala checks if the column has enough precision to represent the largest value of that integer type, and raises an error if not. Here 7896163500 is a BIGINT and BIGINT's maximum storage is 8 Bytes. Since we have 2 fractional digits here, you see the below error, Expression '7896163500' (type: BIGINT) would need to be cast to DECIMAL(20,2) for column 'req_amount' Therefore, use an expression like (cast(7896163500 as BIGINT) as DECIMAL(12,2)) for DECIMAL columns as you have already done to insert the details. Hope this helps! For more: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_decimal.html Thanks!
... View more
04-10-2019
10:31 AM
1 Kudo
If I understand your situation correctly, you have a table backed by 10 TB of data in HDFS with partition on Column A and you want to add the partition also on Column B. So, if Column B is going to be the sub partition, the HDFS directory would look like user/hive/warehouse/database/table/colA/colB or /colB/colA otherwise (considering it as an managed table). Restructuring the HDFS directory manually won't be a great idea because it will become a nightmare to scan the data on all files and organize it accordingly in its corresponding folder. Below is one way of doing it, 1. Create a new table with new structure - i.e., with partitions on Col A and Col B. CREATE TABLE NEWTABLE ( COLUMNS ... ) PARTITON ON ( COL_A INT, COL_B INT ) 2.a. Insert data from the old table to the new table (created in Step #1) like below, INSERT INTO NEWTABLE
SELECT * FROM OLDTABLE But Yes, this step is going to consume a lot of resources during execution if not handled properly, space in HDFS for storing the results as data for NEWTABLE and of-course the time. OR 2.b. If you think that HDFS will not have enough space to hold all the data or resource crunch, I would suggest you to this INSERT in batches with removal of old data after each INSERT operations. INSERT INTO NEWTABLE
SELECT * FROM OLDTABLE
WHERE COL_A='abc'
DELETE FROM OLDTABLE
WHERE COL_A='abc'
INSERT INTO NEWTABLE
SELECT * FROM OLDTABLE
WHERE COL_A='def'
DELETE FROM OLDTABLE
WHERE COL_A='def'
.
.
.
so on. This way, you can unload HDFS with already handled data and balancing the space. If you follow step 2.b. then you can write a Script to automate this process by passing the partition names (derived from SHOW PARTITIONS) dynamically for each run. But, try the first two attempts manually before going with automation to make sure things go as expected.
... View more
04-03-2019
04:03 AM
Impala currently does not support Timestamp in AVRO format, please refer to Impala upstream JIRA: IMPALA-1871. Although the upstream Hive does support Timestamp in AVRO via JIRA: HIVE-8131, in order to keep the compatibility between Hive and Impala, we decided to remove this feature from Hive until Impala supports such functionality.
... View more
04-01-2019
01:58 PM
Hi, The second command mentioned by you (as below) lists only the directories and does not include its sub directories. hadoop fs -ls -S -h <path_to_hadoop_folder> Can you double check and share the result you see? Thanks.
... View more