Member since
06-11-2016
22
Posts
1
Kudos Received
0
Solutions
01-08-2018
11:33 PM
@Benoit Rousseau Thanks for looking into it. We don't intend to create another table. The idea is to do the conversion on fly using PySpark. Neither we want to use any other tool. Any help/ suggestion in order to convert the data using PySpark will be appreciated.
... View more
01-08-2018
10:09 PM
Hi, Our data resides in Hive which is in ORC format. Need to convert this data to AVRO and JSON format. Is there a way to achieve this conversion?
... View more
Labels:
- Labels:
-
Apache Hive
09-19-2017
04:02 AM
@mqureshi If they are moved then the other applications may get affected. Is there any other way round to resolve it without affecting any other applications?
... View more
09-19-2017
03:02 AM
Hi All, Trying to execute a hql with the following properties being set: SET hive.execution.engine=tez;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET hive.vectorized.execution.enabled = true;
SET hive.vectorized.execution.reduce.enabled = true;
SET hive.vectorized.execution.reduce.groupby.enabled = true;
SET mapred.job.queue.name=mtl;
SET hive.cbo.enable=true;
SET hive.compute.query.using.stats=true;
SET hive.stats.fetch.column.stats=true;
SET hive.stats.fetch.partition.stats=true;
SET tez.am.container.reuse.enabled=false;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.exec.reducers.bytes.per.reducer=524288000;
SET tez.queue.name= pete_spark;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager but getting the below error message: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-examples-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-hdp-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-examples-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-hdp-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties
OK Any resolution for this issue? Your help or suggestions are highly appreciated.
... View more
Labels:
- Labels:
-
Apache Hive
09-19-2017
02:43 AM
@Bala Vignesh N V Along with the suggested properties/ configurations . there are other properties/ configurations benf set to resolve the issue. Really appreciate for your advice.
... View more
09-11-2017
03:11 AM
@Steven O'Neill Thank you for the suggestion. As of now, other solution suggested by DBAs is to: create temporary tables, flatten the files and load them in these tables throughout the day, take backup at the end of the day in different tables and delete the temporary tables. Doesn't seems to be a good solution but a workaround. I'd appreciate if some insight can be shared on this.
... View more
09-09-2017
06:58 AM
Hi, In our existing system around 4-6 Million small files are generated in a week. They are generated in different directories and the size of file also varies (<=7MB). This is creating lot of unnecessary cluttering and performance issues. Is there a way out that can help in resolving this issue?
... View more
Labels:
- Labels:
-
Apache Hive
07-03-2016
07:30 AM
@Michael Young Due to complexity going for Python would be better than Java. Thank you for the suggestion.
... View more
07-03-2016
07:24 AM
@rbiswas Thank you. As it involves lot of complexity and the only best solution as of now is to write UDF.
... View more
07-01-2016
06:44 PM
@rbiswas Thank you for detailing out the things. Yes, you are correct there is lot of complexity involved in this. As the JSON itself is in a very complex format. After processing (in the code) for every ID taken in there will 100 -to- 5000 records generated. Which needs to be taken/ captured in JSON File and inserted back in Hive.The situation is that, I have to choose either from Python or Hive. So, out of these 2 using which one will be more helpful in terms of performance and complexity?
... View more