Member since 
    
	
		
		
		05-17-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                3
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4384 | 05-23-2017 02:10 AM | 
			
    
	
		
		
		05-23-2017
	
		
		02:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This bug was because of permission issue. (Reinstallation of Ambari agent)  The error was because of /tmp/parquet-0.log and /tmp/parquet-0.log.lock access denying for hive user. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-22-2017
	
		
		05:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	I've installed Horton's Ambari package and It worked fine till today.  
	I've faced an error while running query on hive. I've getting this warning and can't fix it.  May 22, 2017 5:26:52 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr (build 32c46643845ea8a705c35d4ec8fc654cc8ff816d)
org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr (build 32c46643845ea8a705c35d4ec8fc654cc8ff816d) using format: (.+) version ((.*) )?\(build ?(.*)\)
	at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
	at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
	at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
	at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583)
	at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513)
	at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
	at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:120)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:83)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
	at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:694)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:332)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:458)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427)
	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1765)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:237)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
  As I've googled, it's a bug in previous versions of Apache Parquet, but don't know how to update it. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
- 
						
							
		
			Apache Hive
			
    
	
		
		
		05-18-2017
	
		
		04:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I'm designing an architecture for our data infrastructure in our company. This infrastructure will get payment and user behavioral events from Http producer (Node JS).  I've planned to use Kafka and Hive and Spark and use Avro file format for Kafka and pass them using Kafka Connect to HDFS and hive. But I've read that Avro is not preferred file format to process with Spark and I should use Parquet instead.  Now the question is that how I can generate and pass Parquet to/from Kafka? I'm confused with lot of choices here. Any advice/resource is welcomed. 🙂 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache Kafka
- 
						
							
		
			Apache Spark
 
        





