Member since
08-18-2017
145
Posts
19
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1592 | 05-09-2024 02:50 PM | |
5108 | 09-13-2022 10:50 AM | |
2422 | 07-25-2022 12:18 AM | |
4538 | 06-24-2019 01:56 PM | |
2112 | 10-13-2018 04:40 AM |
09-12-2018
03:09 PM
I assume raw data is in text & u want to convert & load the data into avro tables. If so, u can create another identical text table & specifiy the delimiter in data.. i.e., create table staging(id struct<tid:string,action:string,createdts:timestamp>, cid string, anumber string) row format delimited fields terminated by ',' collection items terminated by '|' stored as textfile; sample text data can be as below 1|success|150987428888,3,12345 insert into testtbl select * from staging; If kafka or flume is generating avro files directly, then those files can be written into table path directly. Its better to create external table if source files are written directly on table path.
... View more
09-12-2018
03:01 PM
Its not possible to use functions in insert into table values statement.
... View more
09-12-2018
01:10 PM
Try below insert statement 0: jdbc:hive2://abcd:10000> with t as (select NAMED_STRUCT('tid','1','action','success', 'createdts',current_timestamp) as id ,'1' as cid,'12345' as anumber)
0: jdbc:hive2://abcd:10000> insert into testtbl select * from t;
No rows affected (20.464 seconds)
0: jdbc:hive2://abcd:10000> select * from testtbl;
+-----------------------------------------------------------------------+--------------+------------------+--+
| testtbl.id | testtbl.cid | testtbl.anumber |
+-----------------------------------------------------------------------+--------------+------------------+--+
| {"tid":"1","action":"success","createdts":"2018-09-12 15:06:27.075"} | 1 | 12345 |
+-----------------------------------------------------------------------+--------------+------------------+--+
... View more
09-11-2018
07:14 AM
Can you check what is the value for tez.runtime.unordered.output.buffer.size-mb ? I think its configured to an higher value.
... View more
08-17-2018
03:01 PM
I am thinking of solution for the jira.. This needs to be implemented in code. There is no config to do this for now.
... View more
08-17-2018
07:09 AM
Concatenation depends on which files are chosen first. The ordering of the files not deterministic with CombineHiveInputFormat, since grouping happens at hadoop layer Concatenation will split or combine files based on orc file size > or < maxSplitSize. for eg.,
say if you have 5 files.. 64MB, 64MB, 64MB, 64MB, 512MB & mapreduce.input.fileinputformat.split.minsize=256mb this can result in 2 files 256MB, 512MB.. or it may result in 3 files 256MB, 256MB, 256MB. I raised a jira for the same Easy solution for this would be to add a path filter to skip files > maxSplitSize.
... View more
08-16-2018
07:10 AM
In beeline or cli, after creating table, u can either do show create or describe to know the table path in hdfs. After exiting from beeline or cli, u can use below command to see the table folder & files inside it hadoop fs -ls -R <tablePath>
... View more
08-01-2018
07:18 AM
This validation is intentionally added in spark with SPARK-15279. As it doesn't make sense to provide DELIMITERS for ORC | PARQUET files.
... View more
07-30-2018
05:37 AM
After create external table with location, can you run "msck repair table data" ? It should automatically update partition information from folder path to hive metadata.
... View more
05-28-2018
04:26 PM
1 Kudo
@cskbhatt, i assume external table location is "hdfs://<emr node>:8020/poc/test_table/" This issue is happening because hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file, but exist inside table folder. When Hive ParquetRecordReader tries to read this file, its throwing above exception. Remove all non parquet files from table location & retry your query.
... View more