About csguna

csguna · ‎06-06-2017

We are having hdfs small size problem , so we were using this code from Github for merging parquet file . https://github.com/Parquet/parquet-mr/tree/master/parquet-tools Step 1 - Performed Local Maven build - maven build - mvn clean package Step 2 - Command Note - File size - 50kb (2500 number of files ) - Total folder size is 2.5 GB hadoop jar <jar file name> merge <input path> <output file> hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar merge /user/hive/warehouse/final_parsing.db/02day02/ /user/hive/warehouse/final_parsing.db/02day02/merged.parquet HDFS Report is Healthy Total size: 1796526652 B Total dirs: 1 Total files: 4145 Total symlinks: 0 Total blocks (validated): 4146 (avg. block size 433315 B) Minimally replicated blocks: 4146 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Tue Jun 06 13:04:57 IST 2017 in 82 milliseconds The filesystem under path '/user/hive/warehouse/final_parsing.db/02day02/' is HEALTHY Our current client configuration . Below are the current configs that didn't help us . so we should bump it up But it didnt help us either Same error . dfs.blocksize - 134217728 equals 64 MB dfs.client-write-packet-size 65536 dfs.client.read.shortcircuit.streams.cache.expiry.ms 300000 dfs.stream-buffer-size 4096 dfs.client.read.shortcircuit.streams.cache.size is 256 Linux Ulimit - 3024 Error stack race [UD1@slave1 target]# hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar merge /user/hive/warehouse/final_parsing.db/02day02/ /user/hive/warehouse/final_parsing.db/02day02/merged.parquet 17/06/06 12:48:16 WARN hdfs.BlockReaderFactory: BlockReaderFactory(fileName=/user/hive/warehouse/final_parsing.db/02day02/part-r-00000-377a9cc1-841a-4ec6-9e0f-0c009f44f6b3.snappy.parquet, block=BP-1780335730-192.168.200.234-1492815207875:blk_1074179291_439068): error creating ShortCircuitReplica. java.io.IOException: Illegal seek at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684) at org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:124) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:126) at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:619) at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:551) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718) at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:484) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:652) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:879) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:937) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:732) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.hadoop.util.H2SeekableInputStream.read(H2SeekableInputStream.java:64) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:480) at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:580) at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:565) at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:496) at org.apache.parquet.hadoop.ParquetFileWriter.appendFile(ParquetFileWriter.java:494) at org.apache.parquet.tools.command.MergeCommand.execute(MergeCommand.java:79) at org.apache.parquet.tools.Main.main(Main.java:223) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 17/06/06 12:48:16 WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x32442dd0): failed to load 1074179291_BP-1780335730-192.168.200.234-1492815207875 17/06/06 12:48:16 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.net.SocketException: Too many open files at sun.nio.ch.Net.socket0(Native Method) at sun.nio.ch.Net.socket(Net.java:423) at sun.nio.ch.Net.socket(Net.java:416) at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:104) at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60) at java.nio.channels.SocketChannel.open(SocketChannel.java:142) at org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3526) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:840) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:755) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:652) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:879) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:937) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:732) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.parquet.hadoop.util.H2SeekableInputStream.read(H2SeekableInputStream.java:64) at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:480) at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:580) at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:565) at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:496) at org.apache.parquet.hadoop.ParquetFileWriter.appendFile(ParquetFileWriter.java:494) at org.apache.parquet.tools.command.MergeCommand.execute(MergeCommand.java:79) at org.apache.parquet.tools.Main.main(Main.java:223) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Correct me if I am wrong this comes because the Cache is getting expired soon before it comes back to check the block . Could anyone recommend a solution for this

csguna · ‎05-29-2017

While you create external table - mention the LOCATION ' ' ( i,e The default location of Hive table is overwritten by using LOCATION ) Then load data from HDFS using ' inpath ' - if you drop the table it will only remove the pointer from the hdfs and will not delete the data in the hdfs. CREATE EXTERNAL TABLE text1 ( wban INT, date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LOCATION ‘ /hive/data/text1’; LOAD DATA INPATH ‘hdfs:/data/2000.txt’ INTO TABLE TABLE_NAME ;

csguna · ‎05-28-2017

I belive this zookeeper timeout property should be put inside conf/zoo.cfg correct me if I am wrong also can you tell me which one of these property be doubled to avoid . I am newbie when it comes to zookeeper tickTime=2000 initLimit=5 syncLimit=2

csguna · ‎05-27-2017

As you said yes it is a data folow language rather than a query language . because you write series declaritive of statement that defines relations where each relations performs new set of data transformation. To put simple it like how to reterive data rather how you want data - like more of query optimizer (example) Mainly used in ETL to ingest external data into Hadoop. hope this suffice

csguna · ‎05-22-2017

What configurations are you going peform single node cluster or multi node cluster . Usually can define your own ip for your hosts like below for example in the /etc/hosts file using vi editor master 192.168.200.221 slave1 192.168.200.222 slave2 192.168.200.223

csguna · ‎05-22-2017

systemctl status firewalld if you see active then your firewall is ON. to switch off the Firewall perform systemctl stop firewalld finally perform this to disable even during boot of your operating system systemctl disable firewalld

csguna · ‎05-21-2017

you might have a corrupt metadata . Stop the hbase server and run the metarepair Read as to what it does before run the below hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

csguna · ‎05-21-2017

looks like you are passing wrong credentials or the sql server aint accepting your windows account authentication make the mysql server accept botht the sql server account and windows account . check what user being passed under the hood.

csguna · ‎05-19-2017

Could you post the full installer log for more digging . one thing i assume might be a permission issue . did you kick off the installation as root ? finally let me know if you are able to go to the hostname:50070 - Namenode UI

csguna · ‎05-13-2017

Please check your hive metastore url in the /etc/hue/conf/hue.ini could you let me know are using a shared metastore for hive and impala . if which database also check these parameters [[[default]]] # Enter the filesystem uri fs_defaultfs=hdfs://localhost:8020 name=Hive # The backend connection to use to communicate with the server. interface=hiveserver2 [[[impala]]] name=Impala interface=hiveserver2

Online	Offline
Last Visited	‎10-28-2024 06:24 AM

Member Since	‎05-16-2016 09:33 PM
Last Visited	‎10-28-2024 06:24 AM
Posts	785
Kudos received	112

Cloudera Community

Re: Kerberos / Sentry Integration

Re: How to upgrade Hive from 2.1 to 3.0 via CDH 6....

Re: How does nameservice id works for HA, how does...

Re: What license does the express edition fall und...

Re: Sqoop2 over Sqoop1 in CDH6

error creating ShortCircuitReplica - Merge Parque...

Re: Data vanishing in HDFS after moving to Hive ta...

Re: NameNoedStanby shutdown by itself when journal...

Re: Data Flow Language-PIG

Re: Exhausted available authentication methods

Re: Exhausted available authentication methods

Re: hbase shell error

Re: Sqoop fails with windows authentication error ...

Re: CDH installation: Failed to create HDFS direct...

Re: No database in impala in hue