Member since 
    
	
		
		
		05-16-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                785
            
            
                Posts
            
        
                114
            
            
                Kudos Received
            
        
                39
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2324 | 06-12-2019 09:27 AM | |
| 3568 | 05-27-2019 08:29 AM | |
| 5720 | 05-27-2018 08:49 AM | |
| 5236 | 05-05-2018 10:47 PM | |
| 3112 | 05-05-2018 07:32 AM | 
			
    
	
		
		
		06-06-2017
	
		
		07:30 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 We are having hdfs small size problem , so we were using this code from Github for merging parquet file .      https://github.com/Parquet/parquet-mr/tree/master/parquet-tools     Step 1  - Performed Local Maven build    - maven build   - mvn clean package     Step 2  - Command   Note - File size -   50kb   (2500 number of files ) - Total folder size is 2.5 GB   hadoop jar <jar file name> merge <input path> <output file>      hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar merge /user/hive/warehouse/final_parsing.db/02day02/ /user/hive/warehouse/final_parsing.db/02day02/merged.parquet      HDFS Report is Healthy        Total size:    1796526652 B
 Total dirs:    1
 Total files:    4145
 Total symlinks:        0
 Total blocks (validated):    4146 (avg. block size 433315 B)
 Minimally replicated blocks:    4146 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        3
 Number of racks:        1
FSCK ended at Tue Jun 06 13:04:57 IST 2017 in 82 milliseconds
The filesystem under path '/user/hive/warehouse/final_parsing.db/02day02/' is HEALTHY                        Our current client configuration .    Below are the current configs that didn't help us .   
so we should bump it up But it didnt help us either Same error . 
dfs.blocksize - 134217728 equals 64 MB
dfs.client-write-packet-size
65536
dfs.client.read.shortcircuit.streams.cache.expiry.ms
300000
dfs.stream-buffer-size
4096
dfs.client.read.shortcircuit.streams.cache.size 
is 256
Linux Ulimit  - 3024      Error stack race       [UD1@slave1 target]# hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar merge /user/hive/warehouse/final_parsing.db/02day02/ /user/hive/warehouse/final_parsing.db/02day02/merged.parquet
17/06/06 12:48:16 WARN hdfs.BlockReaderFactory: BlockReaderFactory(fileName=/user/hive/warehouse/final_parsing.db/02day02/part-r-00000-377a9cc1-841a-4ec6-9e0f-0c009f44f6b3.snappy.parquet, block=BP-1780335730-192.168.200.234-1492815207875:blk_1074179291_439068): error creating ShortCircuitReplica.
java.io.IOException: Illegal seek
	at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
	at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)
	at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699)
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684)
	at org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:124)
	at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:126)
	at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:619)
	at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:551)
	at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784)
	at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718)
	at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:484)
	at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:652)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:879)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:937)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:732)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.parquet.hadoop.util.H2SeekableInputStream.read(H2SeekableInputStream.java:64)
	at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:480)
	at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:580)
	at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:565)
	at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:496)
	at org.apache.parquet.hadoop.ParquetFileWriter.appendFile(ParquetFileWriter.java:494)
	at org.apache.parquet.tools.command.MergeCommand.execute(MergeCommand.java:79)
	at org.apache.parquet.tools.Main.main(Main.java:223)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
17/06/06 12:48:16 WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x32442dd0): failed to load 1074179291_BP-1780335730-192.168.200.234-1492815207875
17/06/06 12:48:16 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.net.SocketException: Too many open files
	at sun.nio.ch.Net.socket0(Native Method)
	at sun.nio.ch.Net.socket(Net.java:423)
	at sun.nio.ch.Net.socket(Net.java:416)
	at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:104)
	at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
	at java.nio.channels.SocketChannel.open(SocketChannel.java:142)
	at org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
	at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3526)
	at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:840)
	at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:755)
	at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:652)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:879)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:937)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:732)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.parquet.hadoop.util.H2SeekableInputStream.read(H2SeekableInputStream.java:64)
	at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:480)
	at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:580)
	at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:565)
	at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:496)
	at org.apache.parquet.hadoop.ParquetFileWriter.appendFile(ParquetFileWriter.java:494)
	at org.apache.parquet.tools.command.MergeCommand.execute(MergeCommand.java:79)
	at org.apache.parquet.tools.Main.main(Main.java:223)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)      Correct me if I am wrong this comes because the Cache is getting expired soon before it comes back to check the block .      Could anyone recommend a solution for this  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			HDFS
			
    
	
		
		
		05-29-2017
	
		
		04:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 While you create external table  - mention the LOCATION ' '  ( i,e The default location of Hive table is overwritten by using LOCATION )  Then load data from HDFS using  ' inpath ' - if you drop the table it will only remove the pointer from the hdfs and will not delete the data in the hdfs.      CREATE EXTERNAL TABLE text1 ( wban INT, date STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘ /hive/data/text1’;  LOAD DATA INPATH ‘hdfs:/data/2000.txt’ INTO TABLE TABLE_NAME ; 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-28-2017
	
		
		06:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I belive this zookeeper timeout property  should be put inside conf/zoo.cfg   correct me if I am wrong   also can you tell me which one of these property be doubled to avoid . I am newbie when it comes to zookeeper   tickTime=2000
initLimit=5
syncLimit=2 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-27-2017
	
		
		07:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As you said yes it is a data folow language rather than a query language .  because you write series declaritive of statement that defines relations   where each relations performs new set of data transformation.   To put simple it like how to reterive data rather how you want data  - like more of query optimizer (example)  Mainly used in ETL to ingest external data into Hadoop.      hope this suffice  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-22-2017
	
		
		04:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 What configurations are you going peform  single node cluster or multi node cluster .     Usually can define your own ip for your hosts like below for example     in the /etc/hosts file using vi editor     master 192.168.200.221
slave1  192.168.200.222
slave2   192.168.200.223 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-22-2017
	
		
		03:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 systemctl status firewalld  if you see active then your firewall is ON.     to switch off the Firewall perform     systemctl stop firewalld  finally perform this to disable even during boot of your operating system     systemctl disable firewalld 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-21-2017
	
		
		03:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 you might have a corrupt metadata .   Stop the hbase server  and run the metarepair   Read as to what it does before run the below   hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-21-2017
	
		
		03:30 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 looks like you are passing wrong credentials or the sql server aint accepting your windows account authentication  make the mysql server accept botht the sql server account and windows account .   check what user being passed under the hood.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2017
	
		
		06:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Could you post the full installer log for more digging .      one thing i assume might be a permission issue .  did you kick off the installation as root ?     finally let me know if you are able to go to the   hostname:50070 - Namenode UI 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-13-2017
	
		
		01:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Please check your hive metastore url in the     /etc/hue/conf/hue.ini  could you let me know are using a shared metastore for hive and impala .  if which database   also check these parameters          [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://localhost:8020    name=Hive
      # The backend connection to use to communicate with the server.   interface=hiveserver2  
      [[[impala]]]
      name=Impala
      interface=hiveserver2 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













