Member since
09-09-2016
31
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2510 | 01-10-2018 02:25 PM |
09-05-2018
08:39 PM
Thanks Andrew. I thought that was probably the answer. Hoping there was a work around
... View more
08-30-2018
02:28 PM
Adding to this..... Obviously, the reason that I would want to use the example.com DNS instead of a specific server's FQDN is for fault tolerance, but out of curiosity what would happen to our environment if we setup the ldaps kereros using one of the AD server FQDNs and the one that we used was removed from the cluster or crashed later on down the road?
... View more
08-29-2018
07:39 PM
I'm setting up kerberos with an existing Active Directory as KDC and having an issue communicating to the ldaps server. We have a cluster of servers for AD. let's say server1.example.com,server2.example.com,server3.example.com and the company just uses example.com to connect. I've setup ldap integration with amabri for user access to the portal via the ambari-server setup-ldap, but did it without ssl and I can use ldap://example.com as the ldap server and it works fine. With ldaps, however, ldaps://example.com:636 doesn't work. I get an error in the ambari-server.log: "java.security.cert.CertificateException: No subject alternative DNS name matching example.com found". I have imported the CA cert and each individual server's certificate into my keystore and put the ca in /etc/pki/ca-trust/source/anchors/activedirectory.pem, but I still can't get it to work for example.com. I can get it to work for server1.example.com and all the others individually, but I can't get it to work for the example.com dns name. I don't have control over the certificate creation on the AD ldaps side. These certs were self-signed by the AD server and each server has it's own certificate. Is there anyway to tell ambari to accept invalid certs for the kerberos wizard, or any other way to get the broader domain name to work? Thanks in advance for any help.
... View more
Labels:
- Labels:
-
Apache Ambari
01-24-2018
08:30 PM
@wyang Do you have any insight into why I can't get hbase-spark to work with Spark 2.2?
... View more
01-10-2018
03:28 PM
I'm using the HDP version 2.6.3 with the 2.2 version of Spark (not HDP cloud) and I'm trying to write to s3 from an IntelliJ project. I have no problems writing to the s3 bucket from the shell, but when I try to test my app on my local machine in IntelliJ I get weird errors after adding the Hadoop-aws and aws-java-sdk dependency jars. It seems like depending on where I place them in the ordering of the dependencies in my POM file I get different errors. When I put the spark dependencies at the top I get ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.MetricsServlet cannot be instantiated. If I take the Hadoop-aws dependency out and then invalidate the cache, everything runs fine except saving to s3, where I get a class not found error for org/apache/http/message/TokenParser. My POM file is pasted below. I have been playing around with putting the Hadoop-aws dependency in different orders in my pom, but haven't been successful in getting it to work. If I put it above my Spark dependencies, I get class not found errors for Spark classes. I set configurations for accessing s3a by setting the fs.s3a.iml, fs.s3a.access.key, and fs.s3a.secret.key properties through sc.hadoopConfiguration.set. Again, I have no problems with saving to s3 from the shell by setting these properties the same way. Any help would be greatly appreciated with this. Is there dependencies that it must be before or after? I wasn't aware that the ordering mattered, but apparently it does. I'm guessing that there might be some conflicting classes between the Hadoop-aws jar and one of the other Hadoop or spark jars. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.lendingtree.data_lake</groupId> <artifactId>Spark2_DL_ETL</artifactId> <version>1.0-SNAPSHOT</version> <name>${project.artifactId}</name> <description>My wonderfull scala app</description> <inceptionYear>2015</inceptionYear> <licenses> <license> <name>My License</name> <url>http://....</url> <distribution>repo</distribution> </license> </licenses> <properties> <maven.compiler.source>1.6</maven.compiler.source> <maven.compiler.target>1.6</maven.compiler.target> <encoding>UTF-8</encoding> <scala.version>2.11.8</scala.version> <scala.compat.version>2.11</scala.compat.version> <spark.version>2.2.0.2.6.3.0-235</spark.version> <kafka.version>0.10.1</kafka.version> <hbase.version>1.1.2.2.6.3.0-235</hbase.version> <hadoop.version>2.7.3.2.6.3.0-235</hadoop.version> <zookeeper.version>3.4.6</zookeeper.version> <shc.version>1.1.0.2.6.3.0-235</shc.version> </properties> <repositories> <repository> <id>hortonworks</id> <name>hortonworks repo</name> <url>http://repo.hortonworks.com/content/repositories/releases/</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.compat.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.compat.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql-kafka-0-10_${scala.compat.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.compat.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version><!--<scope>provided</scope>--></dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>${hadoop.version}</version><!--<scope>provided</scope>--></dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-nfs</artifactId> <version>${hadoop.version}</version><!--<scope>provided</scope>--></dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-auth</artifactId> <version>${hadoop.version}</version><!--<scope>provided</scope>--></dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>${hadoop.version}</version><!--<scope>provided</scope>--></dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-hbase-handler</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-spark</artifactId> <version>${hbase.version}</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-common</artifactId> <version>${hbase.version}</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>${hbase.version}</version> </dependency> <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk</artifactId> <version>1.10.6</version> </dependency><!-- <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-aws</artifactId> <version>${hadoop.version}</version> </dependency> --> <!-- Test --><dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>org.specs2</groupId> <artifactId>specs2-core_${scala.compat.version}</artifactId> <version>2.4.16</version> <scope>test</scope> </dependency> <dependency> <groupId>org.specs2</groupId> <artifactId>specs2-junit_${scala.compat.version}</artifactId> <version>2.4.16</version> <scope>test</scope> </dependency> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_${scala.compat.version}</artifactId> <version>2.2.4</version> <scope>test</scope> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <plugin><!-- see http://davidb.github.com/scala-maven-plugin --><groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> <configuration> <args><!--<arg>-make:transitive</arg>--><arg>-dependencyfile</arg> <arg>${project.build.directory}/.scala_dependencies</arg> </args> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.18.1</version> <configuration> <useFile>false</useFile> <disableXmlReport>true</disableXmlReport><!-- If you have classpath issue like NoDefClassError,... --> <!-- useManifestOnlyJar>false</useManifestOnlyJar --><includes> <include>**/*Test.*</include> <include>**/*Suite.*</include> </includes> </configuration> </plugin> </plugins> </build></project>
... View more
Labels:
- Labels:
-
Apache Spark
01-10-2018
02:25 PM
I created a hive table with HBase integration and was able to read from that table in my Spark job to resolve this for now
... View more
01-02-2018
08:50 PM
@Dongjoon Hyun What dependency are you using to get it to work with 2.2? I'm getting a missing or invalid dependency detected while loading class file HBaseContext.class. Looking at the hortonworks repo (http://repo.hortonworks.com/content/repositories/releases/) it looks like version 1.1.0.2.6.3.0-235 is built for Spark 2.2, but the matching hbase-spark dependency POM has Spark 2.1.1 as the spark version still. I'm guessing that's probably my issue, if you were able to get it to work, maybe I'm just doing something wrong.
... View more
01-02-2018
07:15 PM
Looking for some suggestions on how to read HBase tables using Spark 2.2. I currently have HDP 2.6.3 installed and have been starting to use Spark 2.2. We have been using 1.6.3 with the spark hbase connector and that worked alright, but it doesn't seem to work with spark2. I also see a lot of references to using Phoenix, but that also doesn't support Spark2 until version 4.10 and HDP is still on 4.7. Does anyone have any suggestions or examples of how they are accomplishing interaction with HBase on Spark2?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
12-12-2017
04:46 PM
I do have HADOOP_USER_CLASSPATH_FIRST set to true. How do I find where the Hadoop classpath is? In the Hadoop_Env file it's just set as HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}
... View more
12-08-2017
05:10 PM
I'm getting a vertex failed error when I'm trying to run a query using the Hive interactive site. The actual error is a NoSuchMethodError for the org.apache.hadoop.ipc.RemoteException, but I'm not sure if that's the actual error or not. The query is joining 3 large tables together and It works fine if I just query one of the tables, but as soon as I join one of them together it fails with the below error. Most of the vertex failed questions I've found online have to do with memory, but their error message says something about memory in the error. Mine does not and I've tried doing all of the recommendation for their issues without any different result. The query with the joins seems to work if I turn off LLAP, but it takes a really long time and I want to be able to use this feature if possible. Does anyone know what might be the issue? I'm stuck on this one. Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 3, vertexId=vertex_1512748246177_0021_2_01, diagnostics=[Task failed, taskId=task_1512748246177_0021_2_01_000001, diagnostics=[TaskAttempt 0 failed, info=[org.apache.hadoop.ipc.RemoteException(java.lang.NoSuchMethodError): org.apache.log4j.MDC.put(Ljava/lang/String;Ljava/lang/String;)V
at org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:214)
at org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:547)
at org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
at org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
10-01-2017
08:33 PM
2 Kudos
I want to be able to promote code throughout our environments without having to manually update any of the processors in the UI after deployment. So far, I've seen people talk about copying the flow.xml.gz file over and restarting the service, but I don't see this working in a real world scenario because you would have to promote the entire environment and not just the code changes that were ready to be deployed. Another option, which I think is more likely, is to create a temple, export it, and import it into the next environment. This works well, and I can use the custom properties file with the expression language to handle most of the configuration changes, but when I import a template that references a controller service, it always creates a new controller service instead of using the existing one. This happens even when I copy and past a process group within the same environment or create a new process group from a template within the same environment. I end up having to delete that controller service that is created automatically and update the processor to use the existing controller service. What I would like to do is export the template from UAT, for example, and have a script that would use a mapping of controller service ids from UAT to the matching controller service in PROD and update that value in the XML, so when I import the template it finds the existing controller service instead of creating a new one, but when I update the controller service id and then import it, it updates the id again automatically and then creates a new controller service instead of using the existing. Has anyone dealt with this type of issue before and have any ideas of workarounds?
... View more
Labels:
- Labels:
-
Apache NiFi
07-17-2017
03:30 PM
I like to use SQL Workbench/J. We use this tool for querying Amazon Redshift and Phoenix also. It's free and I haven't had any problems with it. I tried SQuirreL, but it was finicky and didn't like it much. Haven't tried DBVisualizer. http://www.sql-workbench.net/
... View more
05-11-2017
06:59 PM
Thank you @Andy LoPresto! That helps a lot with the details. I'm waiting on Dev Ops to supply me with the private key, so I haven't been able to try this yet, but it's seems pretty straight forward now.
... View more
05-09-2017
05:52 PM
1 Kudo
I am trying to enable SSL on my NiFi instance and I had our Dev Ops team get me a certificate from a trusted CA (We use Comodo for our corporate certificates). They gave me a .cer file (i.e. mt-ssl-cert.cer) and I've looked at all of the posts and the documentation on how to do this, but it seems I am missing something. All of the posts and documentation say if you are using an existing certificate copy the certificate to the nifi conf directory and then enter the location, type, and passwords for the truststore and keystore. Where does one, get this info? Do I need to create my own them on the server and import the certificate I got? If so, can someone help point me in the direction of some instructions? Configuring SSL is foreign to me and I've never had to do anything with it before. Most of the information I find on how to do this on-line refer to self-signed certificates, but I can't seem to find any details on how to do this in a corporate infrastructure. Thanks in advance for your help.
... View more
Labels:
- Labels:
-
Apache NiFi
03-24-2017
02:15 PM
@Robert Levas Would you recommend setting up the MIT KDC on it's own dedicated VM or on one of the masters? What kind of resources does it require?
... View more
02-16-2017
07:51 PM
@cduby All of the examples I see are for doing a spark-submit. I'm trying to run the code within the IntelliJ IDE by right clicking on the class and selecting run. I was doing this by setting the master in the code to local and it worked fine up until the point where I wanted to use hive and then I started getting errors, so I tried to set the master to my remote cluster instead. Is there a better way to do that, so you can actually run your code before you package it and submit it and check the logs to see if it worked?
... View more
02-16-2017
06:08 PM
I have been able to fix the issue for the invalid calss exception on the spark.rdd class, but now I'm getting errors for other classes and anonymous functions in my class. I created a new project with maven dependency instead of sbt and followed the instructions for pointing to the hortonworks repo as @cduby suggested. I used the full spark version in my depencies (1.6.2.2.5.3.0-37). I then removed the Spark I had downloaded locally and instead downloaded the 1.6.2 source code and built it with scala 2.10.5. Now when I run my kafka streaming spark program from the IDE using sparkConf().setMaster("spark://myRemotecluster.domain.com:7077"), I get a class not found exception for org.apache.spark.streaming.kafka.KafkaReceiver instead. I tried running the SparkPi example class from the spark examples the same way as I ran my streaming project, so I could test to see if the core spark libraries would run and I got a class not found exception for com.mycompany.scala.SparkPi$$anonfun$1, so I'm thinking this is a problem with my classes needing to be available on the remote cluster. Is there something I'm missing, or am I just going about this the wrong way? If I could run in local mode and still interact with Hive, HBase, and Hadoop I wouldn't care so much about running my program from the IDE before submitting it to the server.
... View more
02-14-2017
08:10 PM
@Timothy Spann I'm using java 1.8 JDK, compiling on Windows. I'm pretty sure I'm building and pushing to 1.6.2. I have a couple of versions of Spark on my Windows machine, but my SPARK_HOME is set to the 1.6.2 version. I tried submitting it from the command line on my local machine with the --master option set to my remote cluster and received a class not found error, but this time it was for KafkaUtils and not the org.apache.spark.rdd.RDD class that I get when I try to run it feom IntelliJ
... View more
02-14-2017
05:54 PM
Thank you @Timothy Spann. I've had problems testing in Zeppelin before when I need to import in kafka streaming, but I'll give it another go. I've been able to test using the local master for the most part, but when I'm trying to test out some interaction with Hive and possibly HBase and that's why I have been trying to run the code against my cluster where I have the Hive and HBase services running.
... View more
02-14-2017
04:25 PM
@cduby I created a maven project instead and didn't get the dependency resolution errors, but I'm getting class not found errors when I try to run the program. Is the only way to run your code on a cluster to do a spark-submit? I was really wanting to just debug my code as I add to it to make sure it behaves as expected before and then move it to the cluster and submit it as a job
... View more
02-14-2017
04:20 PM
To clarify this. I'm wanting to run the code from the IDE, not through spark-submit. I want to test my code and debug before I do a spark-submit
... View more
02-14-2017
03:10 PM
@cduby I haven't tried to do a spark-submit command line yet. I'm just trying to get the program to compile without error right now.
... View more
02-14-2017
02:55 PM
@cduby Tried it, but it doesn't change anything
... View more
02-14-2017
02:39 PM
@cduby here's a screenshot buildsbt.png
... View more
02-14-2017
02:19 PM
Thank you @cduby. I'm using sbt, but maybe I should use a maven project instead. I tried to translate the examples in the link you shared to sbt dependencies and it seemed to work for all of the apache dependencies, but then I get an error for an unresolved dependency on "org.mortbay.jetty#jetty;6.1.26.hwx" and "org.mortbay.jetty#jetty-util;6.1.26.hwx", which I didn't have as a dependency in my project. I tried adding library dependencies in my build.sbt for them, but still get the error. I looked on the repo site and all I found was /org/mortbay/jetty/project/6.1.26.hwx for directory. I don't understand why I have a dependency on this or how to resolve it. Do you know how to resolve this error? I may try creating a maven project and see if I still get the error message. Thanks again for your help.
... View more
02-13-2017
08:18 PM
2 Kudos
What is the best way to develop Spark applications on your local computer? I'm using IntelliJ and trying to set the master, just for debugging purposes, to my remote HDP cluster so I can test code against Hive and other resources on my cluster. I'm using HDP 2.5.3 and I've added the spark libraries for scala 2.10 and spark 1.6.2 from the maven repository. I've set my build.sbt scalaVersion to 2.10.5 and added the library dependencies. As far as I can tell, I have the exact same versions that are running in HDP 2.5.3 in my project, but when I try to run the application pointing the SparkConf to my remote spark master I get the following error for an incompatible class: java.io.InvalidClassException: org.apache.spark.rdd.RDD; local class incompatible: stream classdesc serialVersionUID = 5009924811397974881, local class serialVersionUID = 7185378471520864965 Is there something I'm missing, or is there a better way to develop and test against the remote cluster?
... View more
Labels:
- Labels:
-
Apache Spark
01-27-2017
09:40 PM
Thank you Binu, I was thinking that was probably the answer, but I was hoping there was a way to get Hive to work for me. Now, off to figure out HBase......
... View more
01-27-2017
04:33 PM
Hi All, I'm new to the Hadoop world and I have a general question about how others are storing data from spark-streaming jobs. I'm working on a concept using Spark streaming to stream data from Kafka and do a streaming ETL job. The job will be processing and storing data in near-real time. In the process I want to persist the data at different stages of the transformation and also to do lookups from other tables. One of the basic examples would be to take the record, check to see if it exists in the data store (which I originally was thinking might be a Hive table) and insert it if it doesn't. I've looked at Hive-Streaming, but I don't see any talk anywhere about spark streaming integration and all of the research I've done about inserting into Hive warns about having many small files created and it causing problems. My question is what are other people doing to store their data from spark-streaming? Should I be using HBase or something else for this instead of Hive. Thanks in advance for your responses.
... View more
Labels:
- Labels:
-
Apache Spark