Member since
10-17-2016
93
Posts
10
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2660 | 09-28-2017 04:38 PM | |
4433 | 08-24-2017 06:12 PM | |
940 | 07-03-2017 12:20 PM |
08-05-2017
12:58 PM
I just installed HDF 3.0. I see the following error in NiFi bulletin board: 12:49:49 UTC
ERROR
3b80ba0f-a6c0-48db-b721-4dbc04cef28e
sandbox-hdf.hortonworks.com:19090
AmbariReportingTask[id=3b80ba0f-a6c0-48db-b721-4dbc04cef28e] Error running task AmbariReportingTask[id=3b80ba0f-a6c0-48db-b721-4dbc04cef28e] due to javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused I added the port forwarding rule in virtual box and then tried to navigate to the following URL: http://localhost:6188/ws/v1/timeline/metrics Where I get the site cant be reached message. Does any one have any idea how to resolve this? Also what dependencies does Ambari Metrics have? which I need to make sure are up and running. I have made no changed to the sandbox and everything is set to default settings. I also notice in the dashboard I see no data available for most of the metrics. I started the HDFS service, I see only a few metrics displayed like "under replicated bloacks", "HDFS disk usage", name node rpc etc but not all of them. @Pierre Villard I am trying to follow your tutorial on monitoring nifi. Unfortunately I can not move forward unless i resolve this.
... View more
Labels:
08-04-2017
10:24 PM
I am also facing the same issue! does anyone know to to resolve this issue?
... View more
08-04-2017
10:03 PM
I just installed HDF and am following the tutorial at : HDF getting started . the command ssh root@127.0.0.1-p 12222; works fine, but here i am not asked to reset the password. when i try ssh root@127.0.0.1-p 2222 i get a connection refused error. I am running Ubuntu and running HDF inside virtualbox. in ubuntu (my local OS running virtualbox ) the etc/hosts file looks like: 127.0.0.1 localhost localhost.localdomain
127.0.1.1 arsalan-Lenovo-IdeaPad-Y410P
127.0.0.1 sandbox.hortonworks.com
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.0.1 sandbox.hortonworks.com
Also when i try the shell web client at localhost:14200 and I am asked for the sandbox login: ssh: Could not resolve hostname sandbox.hortonworks.com: Name or service not known Any idea on how to resolve these issues?
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
08-02-2017
07:50 AM
@anaik i am running atlas on Ubuntu 16.4.... i followed the instructions on the following link: http://atlas.apache.org/InstallationSteps.html
... View more
07-28-2017
08:18 AM
Hi I am not using HDP. I built and installed apache atlas using the following link Atlas Installation. When I start Atlas using the atlas_start.py script the web UI gives the 503 error. I have checked the logs which are as follows: 2017-07-18 21:03:27,477 WARN - [main:] ~ Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.context.event.internalEventListenerProcessor': BeanPostProcessor before instantiation of bean failed; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'graphTransactionAdvisor' defined in URL [jar:file:/home/arsalan/atlas/distro/target/apache-atlas-0.9-incubating-SNAPSHOT-bin/apache-atlas-0.9-incubating-SNAPSHOT/server/webapp/atlas/WEB-INF/lib/atlas-repository-0.9-incubating-SNAPSHOT.jar!/org/apache/atlas/GraphTransactionAdvisor.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'graphTransactionInterceptor' defined in URL [jar:file:/home/arsalan/atlas/distro/target/apache-atlas-0.9-incubating-SNAPSHOT-bin/apache-atlas-0.9-incubating-SNAPSHOT/server/webapp/atlas/WEB-INF/lib/atlas-repository-0.9-incubating-SNAPSHOT.jar!/org/apache/atlas/GraphTransactionInterceptor.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'get' defined in org.apache.atlas.repository.graph.AtlasGraphProvider: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.apache.atlas.repository.graphdb.AtlasGraph]: Factory method 'get' threw exception; nested exception is com.thinkaurelius.titan.core.TitanException: Could not open global configuration (AbstractApplicationContext:550)
2017-07-18 21:03:27,478 ERROR - [main:] ~ Context initialization failed (ContextLoader:350)
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.context.event.internalEventListenerProcessor': BeanPostProcessor before instantiation of bean failed; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'graphTransactionAdvisor' defined in URL [jar:file:/home/arsalan/atlas/distro/target/apache-atlas-0.9-incubating-SNAPSHOT-bin/apache-atlas-0.9-incubating-SNAPSHOT/server/webapp/atlas/WEB-INF/lib/atlas-repository-0.9-incubating-SNAPSHOT.jar!/org/apache/atlas/GraphTransactionAdvisor.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'graphTransactionInterceptor' defined in URL [jar:file:/home/arsalan/atlas/distro/target/apache-atlas-0.9-incubating-SNAPSHOT-bin/apache-atlas-0.9-incubating-SNAPSHOT/server/webapp/atlas/WEB-INF/lib/atlas-repository-0.9-incubating-SNAPSHOT.jar!/org/apache/atlas/GraphTransactionInterceptor.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'get' defined in org.apache.atlas.repository.graph.AtlasGraphProvider: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.apache.atlas.repository.graphdb.AtlasGraph]: Factory method 'get' threw exception; nested exception is com.thinkaurelius.titan.core.TitanException: Could not open global configuration
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:479)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:443)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:325)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:107)
at org.apache.atlas.web.setup.KerberosAwareListener.contextInitialized(KerberosAwareListener.java:31)
at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:800)
at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:444)
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:791)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:294)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1349)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1342)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:505)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at org.eclipse.jetty.server.Server.start(Server.java:387)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:354)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.atlas.web.service.EmbeddedServer.start(EmbeddedServer.java:95)
at org.apache.atlas.Atlas.main(Atlas.java:118)
Application logs: ########################################################################################
Atlas Server (STARTUP)
project.name: apache-atlas
project.description: Metadata Management and Data Governance Platform over Hadoop
build.user: arsalan
build.epoch: 1500400082909
project.version: 0.9-incubating-SNAPSHOT
build.version: 0.9-incubating-SNAPSHOT-r377fe19f023a4036f9fa1503dec6ea698fdb214b
vc.revision: 377fe19f023a4036f9fa1503dec6ea698fdb214b
vc.source.url: scm:git:git://git.apache.org/incubator-atlas.git/atlas-webapp
######################################################################################## (Atlas:185)
2017-07-18 21:31:08,525 INFO - [main:] ~ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Atlas:186)
2017-07-18 21:31:08,525 INFO - [main:] ~ Server starting with TLS ? false on port 21000 (Atlas:187)
2017-07-18 21:31:08,525 INFO - [main:] ~ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< (Atlas:188)
2017-07-18 21:31:13,670 INFO - [main:] ~ No authentication method configured. Defaulting to simple authentication (LoginProcessor:102)
2017-07-18 21:31:13,767 INFO - [main:] ~ Logged in user root (auth:SIMPLE) (LoginProcessor:77)
2017-07-18 21:31:14,320 INFO - [main:] ~ Not running setup per configuration atlas.server.run.setup.on.start. (SetupSteps$SetupRequired:189)
2017-07-18 21:31:14,936 INFO - [main:] ~ Instantiated HBase compatibility layer supporting runtime HBase version 1.1.2: com.thinkaurelius.titan.diskstorage.hbase.HBaseCompat1_1 (HBaseCompatLoader:69)
2017-07-18 21:31:14,962 INFO - [main:] ~ Copied host list from root.storage.hostname to hbase.zookeeper.quorum: localhost (HBaseStoreManager:320)
2017-07-18 21:31:15,045 WARN - [main-SendThread(localhost:2181):] ~ Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (ClientCnxn$SendThread:1102)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-07-18 21:31:15,151 WARN - [main:] ~ Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid (RecoverableZooKeeper:275)
2017-07-18 21:31:16,149 WARN - [main-SendThread(localhost:2181):] ~ Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (ClientCnxn$SendThread:1102)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-07-18 21:31:16,250 WARN - [main:] ~ Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid (RecoverableZooKeeper:275)
Any suggestions how to resolve this issue? Thanks
... View more
Labels:
- Labels:
-
Apache Atlas
07-23-2017
04:13 PM
@Ashutosh Mestry thanks for the instant reply. I will look into the links you have shared. is there any guide on how to run this tutorial and some explanation ? All I get from the page is to run the script and it will set up the metadata information. I guess in HDP it is located under '/usr/hdp/current/atlas-server/bin/quick_start.py'. I did load the data and it is visible in Atlas. Is there any explaination for the example
... View more
07-23-2017
02:51 PM
Hi I want to learn Atlas and then finally have a system (Spark) export meta data information to Atlas (including lineage). I have downloaded HDP 2.6. I tried to follow the Cross Component Lineage tutorial. Unfortunately the download link in 3.2 in the tutorial does not work anymore. Also the directory and scripts in section 4.1 do not exist anymore. Can anyone help me on how to get started with Atlas and the things I should have a look into to be able to import metadata information from Spark into Atlas. Would I need to modify atlas in any way or it would be more of defining entities in Atlas. Thankyou
... View more
Labels:
- Labels:
-
Apache Atlas
07-23-2017
01:18 PM
@Eyad Garelnabi https://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atlas/ Also I am using HDP 2.6. i think the directory in 4.1 is no longer there
... View more
07-20-2017
10:48 AM
Cant find the scripts in section 3.2 of the tutorial. Any help there?
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
07-06-2017
06:43 PM
Hi I was working with running spark locally in Intellij (sending data from Nifi to spark streaming via site2site). Now I have setup spark standalone cluster and want to run my application on it. I simply changed the master URL from local [*] to .setMaster("spark://localhost:7077") it seems to be fine but obviously it throws the class not found error as it does not have the NiFi jars at the executors. One possible way is to create a standalone jar and then use the spark submit script along with the fat jar to run the application. Is it still possible to run the application via Intellij some how? Can I set any of the following properties in the SparkConf().set to make it work? SparkConf().SetJars SparkConf().set (any of the properties mentioned below) spark.driver.extraClassPath spark.jars spark.jars.packages Can i create a fat jar and pass it to spark.driver.extraClassPath? Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Spark
07-03-2017
12:20 PM
Hi After a bit of search I found that I can write each dstream RDD to specified path using the saveasTextFile method within the foreachRDD action. The problem is that this would write the partitions for the RDD to the location. If you have 3 partitions for the RDD, you will have something like part-0000 part-0001 part 0002 and this would be overwritten when the next batch starts. meaning if the following batch has 1 partition, the file 0001 and 0002 will be deleted and 0000 will be overwritten with the new data. I have seen that people have written code to merge these files. As I wanted the data for each batch and did not want to loose the data, I specified the path as follows fileIDs.foreachRDD(rdd =>rdd.saveAsTextFile("/home/arsalan/SparkRDDData/"+ssc.sparkContext.applicationId+"/"+ System.currentTimeMillis() )) this way it would create a new folder for each batch. Later I can get the data for each batch and dont have to worry about finding ways to avoid overwriting of the files.
... View more
06-30-2017
07:58 AM
Hi I am using NiFi to stream csv files to Spark Streaming. Within Spark I register and override a streaming listeners to get batch (write to file) related information: Spark Streaming Listener. So for each batch I can know the start time, end time, scheduling delay, processing time and number of records etc. What I want is to know is, exactly what files were processed in a batch. so I would want to output the batch info mentioned above with an array of UUIDs for all processed files in that batch (the UUIDs can be the file attribute or if need be can be inside the content of the file aswell). I dont think I can pass the Dtream RDD to the listener. Any suggestions?
Thanks
... View more
Labels:
- Labels:
-
Apache Spark
06-26-2017
02:43 PM
@jfrazee thanks for the reply. I am using spark streaming which processes data in batches. I want to know how long does it take to process a batch for a given application (keeping the factors like number of nodes in the cluster constant) at a given data rate (records/batch). I eventually want to check an SLA to make sure that the end to end delay would still fall within the SLA, therefore I want to gather historic data from the application runs and make predictions for the time to process a batch. before starting a new batch you can already make a prediction whether it would voilate the SLA. I will have a look into your suggestions. Thanks
... View more
06-26-2017
01:42 PM
Hi I am totally new to SparkML. I capture the batch processing information for Spark Streaming and write it to file. I capture the following information per batch (FYI each batch in spark is a jobset which means it is a set of jobs.) BatchTime BatchStarted FirstJobStartTime LastJobCompletionTime FirstJobSchedulingDelay TotalJobProcessingTime (time to process all jobs in a batch) NumberOfRecords SubmissionTime TotalDelay (Total execution time for a batch from the time it is submitted, scheduled and processed.) Lets say I want to make a prediction against what will be the total delay when the number of records are X in a batch. Can anyone suggest what machine learning algorithm will be applicable in this scenario (linear regression, classification etc)? Of course the most important parameters would be scheduling delay, total delay and number of records and Job processing time. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
06-23-2017
09:01 PM
Hi I want to extract provenance event data for a specific flow. I have nifi running locally on my laptop. There are a number of flows within my main flow. I am interested in extracting provenance information for a specific flow and ignore all others. After a bit of googling I know there is Reporting task and the provenance api. I couldnt find much information on the rest api any where. I want to extract provenance events of a flow and write to file to process them later. Also do i also get the data along with the events? I want to track this information for a flow delivering streaming data to spark streaming. I might only need the metadata and not the data itself because saving the data might have a huge overhead for streaming jobs running over a long period of time. I want to check an SLA requirement (say end to end delay). I would like to know what time did the data enter nifi and what time did it leave nifi. This could mean I ingest a file which is split into multiple records before i send it to spark streaming. then these records are processed in batches in spark. I track what batch took what time. I want to now extract information when did a record in a batch in spark streaming enter and leave nifi. End to end time= (Nifi Exit time- Nifi Enter time) + spark batch processing time also I might need lineage for back tracking. Any help including pointing me to already existing links is greatly appriciated. Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
06-21-2017
01:24 PM
@Marco Gaido Thanks for your answer. I also found the answer on slide 23 here: Deep dive with Spark Streaming. I do agree , you can get the number of blocks which represent the partitions. total tasks per job = number of stages in the job * number of partitions I was also wondering what happens when the data rate varies considerably, will we have uneven blocks? meaning that tasks will have uneven workload? I am a bit confused now. From simple spark used for batch processing Spark Architecture you have jobs where each job has stages and stages have tasks. Here whenever you have an action a new stage is created. Therefore in such a case the number of stages will depend on the number of actions. A stage contains all transformations until an action is performed (or output). In case of spark streaming, we have one job per action. How many stages will be there if you have a separate job when you perform an action? Thanks
... View more
06-21-2017
07:36 AM
i am running NiFi locally on my laptop... The breaking of file in steps works... thanks... will consider kafka 🙂
... View more
06-20-2017
02:11 PM
Hi I am currently running NiFi on my laptop and not in HDP. I have a huge CSV file (2.7 GB). The csv file contains New York Taxi events, containing basic information about a taxi trip. The header is as follows: vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,pickup_longitude,pickup_latitude,rate_code,store_and_fwd_flag,dropoff_longitude,dropoff_latitude,payment_type,fare_amount,surcharge,mta_tax,tip_amount,tolls_amount,total_amount I ran the follwoing command to get a subset of the file (command line linux): head -n 50000 taxi_sorted.csv > NifiSparkSample50k.csv and the file size turns out to be 5.6 MB. The file contains approximately (2.5 GB * 1024 /5.6) * 50000= 22857142 records What I do is read the file and split it per record and then send the records to Spark Streaming via the nifi spark receiver. Unfortunately (and obviously ) the system gets stuck and NiFi does not respond. I want to have data streamed to Spark. What could be a better way to do this? Instead of splitting the file per record, should i split the file to say 50 000 records and then send these files to Spark?
Regards The following is the log entry 2017-06-20 16:00:11,363 ERROR [Timer-Driven Process Thread-4] o.a.nifi.processors.standard.SplitText SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] failed to process due to java.lang.NullPointerException; rolling back session: {}
java.lang.NullPointerException: null
2017-06-20 16:00:11,380 ERROR [Timer-Driven Process Thread-4] o.a.nifi.processors.standard.SplitText SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] failed to process session due to java.lang.NullPointerException: {}
java.lang.NullPointerException: null
2017-06-20 16:00:11,381 WARN [Timer-Driven Process Thread-4] o.a.nifi.processors.standard.SplitText SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] Processor Administratively Yielded for 1 sec due to processing failure
2017-06-20 16:00:11,381 WARN [Timer-Driven Process Thread-4] o.a.n.c.t.ContinuallyRunProcessorTask Administratively Yielding SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] due to uncaught Exception: java.lang.NullPointerException
2017-06-20 16:00:11,381 WARN [Timer-Driven Process Thread-4] o.a.n.c.t.ContinuallyRunProcessorTask
java.lang.NullPointerException: null
2017-06-20 16:00:41,385 ERROR [Timer-Driven Process Thread-10] o.a.nifi.processors.standard.SplitText SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] failed to process due to java.lang.NullPointerException; rolling back session: {}
java.lang.NullPointerException: null
2017-06-20 16:00:41,385 ERROR [Timer-Driven Process Thread-10] o.a.nifi.processors.standard.SplitText SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] SplitText[id=d8251eb7-102e-115c-ee77-109159d0b9ea] failed to process session due to java.lang.NullPointerException: {}
java.lang.NullPointerException: null
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Spark
06-20-2017
09:26 AM
Hi Is there a way to determine how many jobs will eventually be created against a batch in spark Streaming. Spark captures all the events within a window called batch interval. Apart from this we also have a block interval which divides the batch data into blocks. Example: batch interval 5 seconds Block Interval: 1 second This means you will have 5 blocks in a batch. Each block is processed by a task. Therefore you will have a total of 5 tasks. How can I find the number of Jobs that will be there in a batch? In a spark application you have: Jobs which consists of a number of Sequential Stages and each stage consists of a number of Tasks (mentioned above).
... View more
Labels:
- Labels:
-
Apache Spark
06-20-2017
07:59 AM
hi @bkosaraju Thanks for the reply. I do have the history server configured and running capturing the events to the specified directory (also I am not using HDP, I am using spark standalone and spark from intellij). The issue is that within the history server the streaming events are not captured. The details for the batch. Although I have overridden the streaming event listener OnBatchSubmit and added code to write to a log file.
... View more
06-19-2017
07:55 AM
Hi In spark you have the ability to log events to file and later this information is read by the history server to view in the browser. The list of the events that are logged are mentioned here: https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.scheduler.SparkListener When using Spark Streaming, an additional tab becomes available in the Web UI which gives information regarding the Micro batches, processing time, number of records etc. Unfortunately this information is not available within the logs and nor does the History Server display them on replay of the logs. I have not found any configuration to enable logging the streaming events. How can i capture this information and lets say write it to a log file? Thanks Arsalan
... View more
Labels:
- Labels:
-
Apache Spark
06-09-2017
09:00 AM
to run the code in intellij the above code is fine! Only need to add ssc.awaitTermination() after ssc.start().
To run in shell, I need to create a fatJar (uberjar/standalone Jar)The missing import org.apache.nifi.events._ was available in nifi-framework-api-1.2.0.jar .
I used maven to create the fat jar using the maven-assembly-plugin
... View more
06-08-2017
03:35 PM
Hi George I am not using the sandbox but rather have a standalone installation of spark and nifi on my pc I am using apache nifi 1.2.0 and I have followed the entire tutorial. I get the error in import org.apache.nifi.events._ <console>:38: error: object events is not a member of package org.apache.nifi
import org.apache.nifi.events._ I have included all the relevant jars that you have mentioned. nifi-site-to-site-client-1.2.0.jar nifi-spark-receiver-1.2.0.jar nifi-api-1.2.0.jar nifi-utils-1.2.0.jar nifi-client-dto-1.2.0.jar I opened all the jars and sure enough there in no directory org.apache.nifi.events in any of the jars. How can i find this missing import? also i tried to run the code in intellij i dont get any errors but i get the following warning: 17/06/08 18:16:14 INFO ReceiverSupervisorImpl: Stopping receiver with message: Registered unsuccessfully because Driver refused to start receiver 0 i copied the following code in Intellij. i commented the last line // Import all the libraries required
import org.apache.nifi._
import java.nio.charset._
import org.apache.nifi.spark._
import org.apache.nifi.remote.client._
import org.apache.spark._
import org.apache.nifi.events._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.nifi.remote._
import org.apache.nifi.remote.client._
import org.apache.nifi.remote.protocol._
import org.apache.spark.storage._
import org.apache.spark.streaming.receiver._
import java.io._
import org.apache.spark.serializer._
object SparkNiFiAttribute {
def main(args: Array[String]) {
/*
import java.util
val additionalJars = new util.ArrayList[String]
additionalJars.add("/home/arsalan/NiFiSparkJars/nifi-site-to-site-1.2.0.jar")
*/
val config = new SparkConf().setAppName("Nifi_Spark_Data")
// .set("spark.driver.extraClassPath","/home/arsalan/NiFiSparkJars/nifi-site-to-site-client-1.2.0.jar:/home/arsalan/NiFiSparkJars/nifi-spark-receiver-1.2.0.jar:/home/arsalan/nifi-1.2.0/lib/nifi-api-1.2.0.jar:/home/arsalan/nifi-1.2.0/lib/bootstrap/nifi-utils-1.2.0.jar:/home/arsalan/nifi-1.2.0/work/nar/framework/nifi-framework-nar-1.2.0.nar-unpacked/META-INF/bundled-dependencies/nifi-client-dto-1.2.0.jar")
.set("spark.driver.allowMultipleContexts", "true")
.setMaster("local[*]")
// Build a Site-to-site client config with NiFi web url and output port name[spark created in step 6c]
val conf = new SiteToSiteClient.Builder().url("http://localhost:8080/nifi").portName("Data_to_Spark").buildConfig()
// Set an App Name
// Create a StreamingContext
val ssc = new StreamingContext(config, Seconds(1))
ssc.sparkContext.getConf.getAll.foreach(println)
// Create a DStream using a NiFi receiver so that we can pull data from specified Port
val lines = ssc.receiverStream(new NiFiReceiver(conf, StorageLevel.MEMORY_ONLY))
// Map the data from NiFi to text, ignoring the attributes
val text = lines.map(dataPacket => new String(dataPacket.getContent, StandardCharsets.UTF_8))
// Print the first ten elements of each RDD generated
text.print()
// Start the computation
ssc.start()
}
}
//SparkNiFiAttribute.main(Array())
... View more
06-05-2017
09:16 AM
Thankyou for your reply. I see that you are using version 1.3.0 which I do not have. I tried to import the template but i get an error saying the UpdateRecord possessor is not known. Is the nar file available?
... View more
06-04-2017
04:50 PM
Hi I have multiple csv files where each file contains an attribute value at a given time. There are a total of 60 files (60 different attributes). These are basically Spark's Metric Dump. example: The file name is the name of the application followed by the attribute name. For the example below the application name is :local-1495979652246 and attribute for the first file is: BlockManager.disk.diskSpaceUsed_MB local-1495979652246.driver.BlockManager.disk.diskSpaceUsed_MB.csv local-1495979652246.driver.BlockManager.memory.maxMem_MB.csv local-1495979652246.driver.BlockManager.memory.memUsed_MB.csv each file contains values like: t value 1496588167 0.003329809088456 1496588168 0.00428465362778284
The file name specifys the name of the attribute. The first thing i need to do is to update csv header field called value to the attribute name from the filename t BlockManager.disk.diskSpaceUsed_MB 1496588167 0.003329809088456 the next thing would be to merge all files for the same application based on the value of the filed t. and eventually I should have one csv file for each application containing the values for all the attributes like: t BlockManager.disk.diskSpaceUsed_MB BlockManager.memory.maxMem_MB BlockManager.memory.memUsed_MB more attributes... 1496588167 0.003329809088456 some value some value some value 1496588168 0.00428465362778284 some value come value .. any suggestions?
... View more
Labels:
- Labels:
-
Apache NiFi
05-14-2017
06:55 PM
Hi I do know there are a number of threads posted about how to run a spark job from NiFi, but most of them explain a setup on HDP. I am using windows. I have spark and NiFi locally installed. Can anyone explain how can I configure the Execute Process to run the following command (which I run in the command line and it works) spark-submit2.cmd --class "SimpleApp" --master local[4] file:///C:/Simple_Project/target/scala-2.10/simple-project_2.10-1.0.jar
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Spark
05-10-2017
03:51 PM
Hi @Matt Burgess
I tried the technique mentioned below. It works. I get the database result as the content of the file and the csv data as the attributes to the file.... eventually I can also move the data to the attributes and then i have all the content in one place. Hence Merged. I am not sure if this is the ideal solution. But it works! thoughts?
... View more
05-10-2017
02:15 PM
Thanks @Matt Burgess! I am wondering the executesql accepts an incoming connection. After getting the csv and then converting it to JSON and then eventually extracting all the information in the attributes These attributes will be available to the ExecuteSql processor if I direct the flow to it. Can i then use it to perform a query like SELECT `driverinfo`.`idDriverInfo`,
`driverinfo`.`DriverName`,
`driverinfo`.`DriverAge`,
`driverinfo`.`DriverLicence`,
`driverinfo`.`DriverCar`,
`driverinfo`.`DriverNationality`,
`driverinfo`.`DriverRating`
FROM `driverdb`.`driverinfo` WHERE `driverinfo`.`idDriverInfo`= ${id}; wwhere the ${id} is the attribute of the incoming flow file. In case the query is successful a new flow file will be generated? what will happen to the original flow flie?
... View more
04-20-2017
06:43 AM
any suggestions guys?
... View more
- « Previous
-
- 1
- 2
- Next »