Member since
05-16-2016
785
Posts
114
Kudos Received
39
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2325 | 06-12-2019 09:27 AM | |
| 3568 | 05-27-2019 08:29 AM | |
| 5721 | 05-27-2018 08:49 AM | |
| 5236 | 05-05-2018 10:47 PM | |
| 3113 | 05-05-2018 07:32 AM |
06-06-2017
07:30 AM
We are having hdfs small size problem , so we were using this code from Github for merging parquet file . https://github.com/Parquet/parquet-mr/tree/master/parquet-tools Step 1 - Performed Local Maven build - maven build - mvn clean package Step 2 - Command Note - File size - 50kb (2500 number of files ) - Total folder size is 2.5 GB hadoop jar <jar file name> merge <input path> <output file> hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar merge /user/hive/warehouse/final_parsing.db/02day02/ /user/hive/warehouse/final_parsing.db/02day02/merged.parquet HDFS Report is Healthy Total size: 1796526652 B
Total dirs: 1
Total files: 4145
Total symlinks: 0
Total blocks (validated): 4146 (avg. block size 433315 B)
Minimally replicated blocks: 4146 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Tue Jun 06 13:04:57 IST 2017 in 82 milliseconds
The filesystem under path '/user/hive/warehouse/final_parsing.db/02day02/' is HEALTHY Our current client configuration . Below are the current configs that didn't help us .
so we should bump it up But it didnt help us either Same error .
dfs.blocksize - 134217728 equals 64 MB
dfs.client-write-packet-size
65536
dfs.client.read.shortcircuit.streams.cache.expiry.ms
300000
dfs.stream-buffer-size
4096
dfs.client.read.shortcircuit.streams.cache.size
is 256
Linux Ulimit - 3024 Error stack race [UD1@slave1 target]# hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar merge /user/hive/warehouse/final_parsing.db/02day02/ /user/hive/warehouse/final_parsing.db/02day02/merged.parquet
17/06/06 12:48:16 WARN hdfs.BlockReaderFactory: BlockReaderFactory(fileName=/user/hive/warehouse/final_parsing.db/02day02/part-r-00000-377a9cc1-841a-4ec6-9e0f-0c009f44f6b3.snappy.parquet, block=BP-1780335730-192.168.200.234-1492815207875:blk_1074179291_439068): error creating ShortCircuitReplica.
java.io.IOException: Illegal seek
at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684)
at org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:124)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:126)
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:619)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:551)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:484)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:652)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:879)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:937)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:732)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.parquet.hadoop.util.H2SeekableInputStream.read(H2SeekableInputStream.java:64)
at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:480)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:580)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:565)
at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:496)
at org.apache.parquet.hadoop.ParquetFileWriter.appendFile(ParquetFileWriter.java:494)
at org.apache.parquet.tools.command.MergeCommand.execute(MergeCommand.java:79)
at org.apache.parquet.tools.Main.main(Main.java:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
17/06/06 12:48:16 WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x32442dd0): failed to load 1074179291_BP-1780335730-192.168.200.234-1492815207875
17/06/06 12:48:16 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:423)
at sun.nio.ch.Net.socket(Net.java:416)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:104)
at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
at java.nio.channels.SocketChannel.open(SocketChannel.java:142)
at org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3526)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:840)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:755)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:652)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:879)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:937)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:732)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.parquet.hadoop.util.H2SeekableInputStream.read(H2SeekableInputStream.java:64)
at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:480)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:580)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:565)
at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:496)
at org.apache.parquet.hadoop.ParquetFileWriter.appendFile(ParquetFileWriter.java:494)
at org.apache.parquet.tools.command.MergeCommand.execute(MergeCommand.java:79)
at org.apache.parquet.tools.Main.main(Main.java:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Correct me if I am wrong this comes because the Cache is getting expired soon before it comes back to check the block . Could anyone recommend a solution for this
... View more
Labels:
- Labels:
-
HDFS
05-29-2017
04:25 AM
While you create external table - mention the LOCATION ' ' ( i,e The default location of Hive table is overwritten by using LOCATION ) Then load data from HDFS using ' inpath ' - if you drop the table it will only remove the pointer from the hdfs and will not delete the data in the hdfs. CREATE EXTERNAL TABLE text1 ( wban INT, date STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘ /hive/data/text1’; LOAD DATA INPATH ‘hdfs:/data/2000.txt’ INTO TABLE TABLE_NAME ;
... View more
05-28-2017
06:24 AM
I belive this zookeeper timeout property should be put inside conf/zoo.cfg correct me if I am wrong also can you tell me which one of these property be doubled to avoid . I am newbie when it comes to zookeeper tickTime=2000
initLimit=5
syncLimit=2
... View more
05-27-2017
07:04 PM
As you said yes it is a data folow language rather than a query language . because you write series declaritive of statement that defines relations where each relations performs new set of data transformation. To put simple it like how to reterive data rather how you want data - like more of query optimizer (example) Mainly used in ETL to ingest external data into Hadoop. hope this suffice
... View more
05-22-2017
04:00 AM
What configurations are you going peform single node cluster or multi node cluster . Usually can define your own ip for your hosts like below for example in the /etc/hosts file using vi editor master 192.168.200.221
slave1 192.168.200.222
slave2 192.168.200.223
... View more
05-22-2017
03:58 AM
systemctl status firewalld if you see active then your firewall is ON. to switch off the Firewall perform systemctl stop firewalld finally perform this to disable even during boot of your operating system systemctl disable firewalld
... View more
05-21-2017
03:38 AM
you might have a corrupt metadata . Stop the hbase server and run the metarepair Read as to what it does before run the below hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
... View more
05-21-2017
03:30 AM
looks like you are passing wrong credentials or the sql server aint accepting your windows account authentication make the mysql server accept botht the sql server account and windows account . check what user being passed under the hood.
... View more
05-19-2017
06:25 PM
Could you post the full installer log for more digging . one thing i assume might be a permission issue . did you kick off the installation as root ? finally let me know if you are able to go to the hostname:50070 - Namenode UI
... View more
05-13-2017
01:57 AM
Please check your hive metastore url in the /etc/hue/conf/hue.ini could you let me know are using a shared metastore for hive and impala . if which database also check these parameters [[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://localhost:8020 name=Hive
# The backend connection to use to communicate with the server. interface=hiveserver2
[[[impala]]]
name=Impala
interface=hiveserver2
... View more