About vijaysinghparma

vijaysinghparma · ‎01-08-2018

@Benoit Rousseau Thanks for looking into it. We don't intend to create another table. The idea is to do the conversion on fly using PySpark. Neither we want to use any other tool. Any help/ suggestion in order to convert the data using PySpark will be appreciated.

vijaysinghparma · ‎01-08-2018

Hi, Our data resides in Hive which is in ORC format. Need to convert this data to AVRO and JSON format. Is there a way to achieve this conversion?

vijaysinghparma · ‎09-19-2017

@mqureshi If they are moved then the other applications may get affected. Is there any other way round to resolve it without affecting any other applications?

vijaysinghparma · ‎09-19-2017

Hi All, Trying to execute a hql with the following properties being set: SET hive.execution.engine=tez; SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK; SET hive.vectorized.execution.enabled = true; SET hive.vectorized.execution.reduce.enabled = true; SET hive.vectorized.execution.reduce.groupby.enabled = true; SET mapred.job.queue.name=mtl; SET hive.cbo.enable=true; SET hive.compute.query.using.stats=true; SET hive.stats.fetch.column.stats=true; SET hive.stats.fetch.partition.stats=true; SET tez.am.container.reuse.enabled=false; SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; SET hive.exec.reducers.bytes.per.reducer=524288000; SET tez.queue.name= pete_spark; SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager but getting the below error message: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-examples-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-hdp-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] WARNING: Use "yarn jar" to launch YARN applications. Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-examples-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hive/lib/spark-hdp-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Logging initialized using configuration in file:/etc/hive/2.4.2.0-258/0/hive-log4j.properties OK Any resolution for this issue? Your help or suggestions are highly appreciated.

vijaysinghparma · ‎09-19-2017

@Bala Vignesh N V Along with the suggested properties/ configurations . there are other properties/ configurations benf set to resolve the issue. Really appreciate for your advice.

vijaysinghparma · ‎09-11-2017

@Steven O'Neill Thank you for the suggestion. As of now, other solution suggested by DBAs is to: create temporary tables, flatten the files and load them in these tables throughout the day, take backup at the end of the day in different tables and delete the temporary tables. Doesn't seems to be a good solution but a workaround. I'd appreciate if some insight can be shared on this.

vijaysinghparma · ‎09-09-2017

Hi, In our existing system around 4-6 Million small files are generated in a week. They are generated in different directories and the size of file also varies (<=7MB). This is creating lot of unnecessary cluttering and performance issues. Is there a way out that can help in resolving this issue?

vijaysinghparma · ‎07-03-2016

@Michael Young Due to complexity going for Python would be better than Java. Thank you for the suggestion.

vijaysinghparma · ‎07-03-2016

@rbiswas Thank you. As it involves lot of complexity and the only best solution as of now is to write UDF.

vijaysinghparma · ‎07-01-2016

@rbiswas Thank you for detailing out the things. Yes, you are correct there is lot of complexity involved in this. As the JSON itself is in a very complex format. After processing (in the code) for every ID taken in there will 100 -to- 5000 records generated. Which needs to be taken/ captured in JSON File and inserted back in Hive.The situation is that, I have to choose either from Python or Hive. So, out of these 2 using which one will be more helpful in terms of performance and complexity?

Online	Offline
Last Visited	‎02-08-2018 12:56 AM

Member Since	‎06-11-2016 08:35 PM
Last Visited	‎02-08-2018 12:56 AM
Posts	22
Kudos received	1

Cloudera Community

Re: Converting Hive ORC data to AVRO and JSON form...

Converting Hive ORC data to AVRO and JSON format?

Re: Hive Error: SLF4J: Class path contains multipl...

Hive Error: SLF4J: Class path contains multiple SL...

Re: Facing small file issue on Hive

Re: Facing small file issue on Hive

Facing small file issue on Hive

Re: Is Python Script better or Hive UDF?

Re: Is Python Script better or Hive UDF?

Re: Is Python Script better or Hive UDF?