Member since
11-07-2016
637
Posts
253
Kudos Received
144
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2193 | 12-06-2018 12:25 PM | |
2222 | 11-27-2018 06:00 PM | |
1726 | 11-22-2018 03:42 PM | |
2775 | 11-20-2018 02:00 PM | |
5005 | 11-19-2018 03:24 PM |
10-17-2018
04:08 AM
@Anpan K, Yes. you can read it like below %pyspark
content = sc.textFile("file:///path/example.txt") If file schema is not given,it defaults to HDFS
... View more
10-16-2018
04:06 PM
@Michael Bronson, Are you running the command from the same node where zookeeper is running ? Can you please paste the command that you are running. Can you try passing proper hostname instead of localhost:2181 while running the command?
... View more
10-16-2018
03:53 PM
@Michael Bronson, Is your cluster Kerberized? If it is not kerberized you may not have kinit installed. In that case, you can just run these commands # su hdfs # zookeeper-client -server {zk-host}:2181 ## zk: zkhost-1(CONNECTED) 1] ls / If this doesn't work, try restarting zookeeper server and try again.
... View more
10-16-2018
03:45 PM
@HENI MAHER, Looks like ResourceManager is not running. Please start ResourceManager and try again.
... View more
10-16-2018
03:38 PM
I am facing an issue while starting Spark thrift server when NN HA is enabled. I have 2 namenodes on host1 and host2. It is starting when namenode on host1 is active and fails to start when namenode on host1 is standby. Below is the stack trace Exception in thread "main" org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1952)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1423)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3085)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1154)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:966)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
);
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:79)
at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Pasting the contents of spark-thrift-sparkconf.conf spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.initialExecutors 0
spark.dynamicAllocation.maxExecutors 10
spark.dynamicAllocation.minExecutors 0
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraJavaOptions -XX:+UseNUMA
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.hadoop.cacheConf false
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 7d
spark.history.fs.cleaner.maxAge 90d
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.io.compression.lz4.blockSize 128kb
spark.master yarn-client
spark.scheduler.allocation.file /usr/hdp/current/spark2-thriftserver/conf/spark-thrift-fairscheduler.xml
spark.scheduler.mode FAIR
spark.shuffle.file.buffer 1m
spark.shuffle.io.backLog 8192
spark.shuffle.io.serverThreads 128
spark.shuffle.service.enabled true
spark.shuffle.unsafe.file.output.buffer 5m
spark.sql.autoBroadcastJoinThreshold 26214400
spark.sql.hive.convertMetastoreOrc true
spark.sql.hive.metastore.jars /usr/hdp/3.0.0.0-1634/spark2/standalone-metastore/standalone-metastore-1.21.2.3.0.0.0-1634-hive3.jar
spark.sql.hive.metastore.version 3.0
spark.sql.orc.filterPushdown true
spark.sql.orc.impl native
spark.sql.statistics.fallBackToHdfs true
spark.sql.warehouse.dir /apps/spark/warehouse
spark.unsafe.sorter.spill.reader.buffer.size 1m
spark.yarn.executor.failuresValidityInterval 2h
spark.yarn.maxAppAttempts 1
spark.yarn.queue default I checked for core-site.xml and hdfs-site.xml in the node where spark thrift server is running. fs.defaultFS is having the proper value ( ie hdfs://namespace). I am guessing that it is picking the host1 value from some config file but not sure from which file. Please let me know any other places to look. . Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
10-16-2018
03:24 PM
@Michael Bronson, From the logs it looks like the client is not yet connected to the server [zk: localhost:2181(CONNECTING)0] If it is connected , you should get CONNECTED instead of CONNECTING. If your cluster is Kerberized, you need to run kinit before connecting to the zookeeper client. You can run the below steps # kinit -kt /etc/security/keytabs/hdfs.headless.keytab {principal}
# zookeeper-client -server {zk-host}:2181
## zk: zkhost-1(CONNECTED) 1] ls /
... View more
10-16-2018
03:27 AM
2 Kudos
If you have erasure coded some directory and perform some operations on the directory you might have observed WARN messages like below WARN erasurecode.ErasureCodeNative: Loading ISA-L failed: Failed to load libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or directory)
WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable This WARN messages are due to ISA Library not being present on the node. Below are the steps to enable the library 1) Clone the isa-l github repository. # git clone https://github.com/01org/isa-l.git 2) Go to the cloned directory # cd isa-l 3) Install yasm if you do not have it already # yum install -y yasm ---> centOS
# apt-get install yasm ----> ubuntu 4) Build the library # make -f Makefile.unx 5) Copy the library files to lib directory # cp bin/libisal.so bin/libisal.so.2 /lib64 6) Verify that isa-l library is enabled properly # hadoop checknative
Expected output
18/10/12 10:20:03 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/10/12 10:20:03 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/hdp/3.0.0.0-1634/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
zstd : false
snappy: true /usr/hdp/3.0.0.0-1634/hadoop/lib/native/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: true /lib64/libcrypto.so
ISA-L: true /lib64/libisal.so.2 -------------> Shows that ISA-L is loaded.
If step 6 uses /usr/lib64 directory instead of /lib64, you need copy the .so files in Step 5 to /usr/lib64 directory. Perform the steps on all datanode and namenode hosts or copy the .so files from the above node to /lib64 directories of all other nodes. . Hope this helps 🙂
... View more
Labels:
10-15-2018
03:47 PM
@Madhura Mhatre, You can install all these components on some other node and then stop and delete all these components from the node. 1) Go to Ambari -> Hosts 2) Select the Host where you want to move these components to 3) Click on +ADD button and select the component you want to install (Spark2 History Server, Livy for spark2 server etc) 4) Start the components on the new host after installing them 5) Click on the old host and stop all the spark components and delete them. . -Aditya
... View more
10-11-2018
04:09 AM
1 Kudo
@vamsi valiveti, You need atleast the classname to get all the methods available in the class. I can think of a solution without Google. Method 1: Run this command (Replace jar-path with real jar path) jar -tf {jar-path} | grep -i class | sed -e 's/\//./g' | sed -e 's/\.class//g' | xargs javap -classpath {jar-path} Method 2: You can open the Jar file and check the list of the classes and then list the methods in the class 1) Check the classnames using vim (not vi) vim Piggybank.jar 2) Take the clasname in which you want to list the methods (copy the path including package name) javap -classpath {path-to-jar-file} {full-class-name-including-package-name}
ex: javap -classpath example.jar org.apache.hadoop.xyz.Abc (Abc is the class name) . If this helps, please take a moment to login and Accept the answer.
... View more
10-10-2018
06:18 PM
@Sami Ahmad, You can run the below command set; For ex: If you want to check the params that are set to true, then you can run hive -e 'set;' | grep true . -Aditya
... View more