Member since
03-31-2017
57
Posts
1
Kudos Received
0
Solutions
07-05-2018
09:51 AM
Hi, @Felix Albani Thanks.
... View more
06-28-2018
10:20 AM
Hi,
I want to fetch stock exchange data from Alpha vantage API using Spark Streaming.
I used below API which return data in JSON format : https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=TCS&interval=1min&apikey=apikey
How to fetch continuous streaming of stock exchange data using Spark Streaming Java API.
... View more
Labels:
- Labels:
-
Apache Spark
06-14-2018
07:44 AM
Hi, @Felix Albani I set driver memory to 20 GB.I tried using below spark-submit parameters : ./bin/spark-submit --driver-memory 20g --executor-cores 3 --num-executors 20 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.yarn.driver.memoryOverhead=1024 --class org.apache.TransformationOper --master yarn-cluster /home/hdfs/priyal/spark/TransformationOper.jar Cluster configuration is : 1 Master node(r3.xlarge) and 1 worker node(r3.xlarge) : 4 vCPUs, 30GB memory,40 GB storage Still getting the same issue spark job is in running state and YARN memory is 95% used.
... View more
06-13-2018
01:39 PM
Hi, @Vinicius Higa Murakami , @Felix Albani I have set spark.yarn.driver.memoryOverhead=1 GB,spark.yarn.executor.memoryOverhead=1 GB and spark_driver_memory=12 GB. I have set storage level to MEMORY_AND_DISK_SER(). Hadoop Cluster configuration is : 1 Master node(r3.xlarge) and 1 worker node (m4.xlarge).
Here is the spark-submit parameter :
./bin/spark-submit --driver-memory 12g --executor-cores 2 --num-executors 3 --executor-memory 3g --class org.apache.TransformationOper --master yarn-cluster /spark/TransformationOper.jar Spark job entered into running state but it has been executing for last one hour still execution not completed.
... View more
06-11-2018
07:07 AM
Hi, @Vinicius Higa Murakami I want to process 4 GB file so I have configured executor memory to 10 gb and number of executors to 10 in spark-env.sh file.Here is the spark-submit parameters : ./bin/spark-submit --class org.apache.TransformationOper --master local[2] /root/spark/TransformationOper.jar /Input/error.log I tried to set configuration manually using below spark-submit parameters : ./bin/spark-submit --driver-memory 5g --num-executors 10 --executor-memory 10g --class org.apache.TransformationOper --master local[2] /root/spark/TransformationOper.jar And set master as a yarn-cluster still got the OutOfMemoryError error.
... View more
06-08-2018
11:48 AM
@Jay Kumar SenSharma Thanks
... View more
06-08-2018
11:17 AM
Hi, I have created HDP 2.6 on AWS with 1 master node and 4 worker nodes.I am using cluster management tool Ambari.
I have configured spark-env.sh file on master node now i want to apply all those setting to all worker nodes on cluster. How to refresh the cluster configuration for reflecting the latest configs to all nodes in the cluster.
... View more
- Tags:
- aws
- hadoop
- Hadoop Core
Labels:
- Labels:
-
Apache Hadoop
06-08-2018
11:11 AM
Hi, I have Created HDP 2.6 on AWS with master node(m4.2xlarge) and 4 worker nodes(m4.xlarge).
I want to process 4GB log file using Spark job but i am getting below error while executing Spark Job : Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236) I have configured spark-env.sh file for master node : SPARK_EXECUTOR_MEMORY="5G"
SPARK_DRIVER_MEMORY="5G" but it throws the same error.
I also configured worker nodes with those settings and increase Java heap size for hadoop client,Resource Manager,Node Manager and for YARN still spark job aborted. Thanks,
... View more
Labels:
- Labels:
-
Apache Spark
04-17-2018
07:58 AM
Hi, @Marcos Jimenez Rodriguez Have you tried to execute oozie job as a admin user from Ambari? Check admin user have permissions to write : hdfs dfs -chown -R admin:hadoop /user/admin
... View more
04-05-2018
09:22 AM
@Aditya Sirna I want to post pig relation output to external service using curl. How to pass pig relation values in a below curl command : curl -X POST http://xxx.xx.xxx.xx/services//api/data -H "accept: application/json" -H "authorization: authorization-token value" -H "cache-control: no-cache" -H "content-type: multipart/form-data;boundary=----xxxxxxxxxxxxxxxxx" -H "postman-token:postman-token value" -F "title=the value which i want to fetch from pig relation " -F "description=descirption" i want to fetch value for title from pig relation.
... View more
04-03-2018
09:28 AM
Hi, I want to post data to external service using curl command from Pig. Is it possible to run curl command from Pig?
... View more
Labels:
- Labels:
-
Apache Pig
03-31-2018
06:04 AM
@Rahul Soni Hi, I already tried the script which you sent but it insert NULL values.I mention that in my previous comment. Datatype of columns in Pig is (id:int,name:chararray,salary:float) and in MSSQL is (id int, name varchar, salary float).I tried with different datatype also but It insert only null values.I am not able to fetch the values from Pig relation G which I mention on my question.
... View more
03-30-2018
01:19 PM
@schhabra Hi Shubham, I am able to dump the data successfully but i am getting error with STORE function. I tried below script : STORE G INTO 'emp' USING org.apache.pig.piggybank.storage.DBStorage('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'jdbc:sqlserver://xxx.x.xx.xx:1433;databaseName=test', 'username', 'password', 'INSERT INTO emp (id,name,email) VALUES (?,?,?)'); then it store the exact count of my input file with NULL values in mssql but i am getting issue with fetching values from a relation in Pig. I tried below script but it didn't work. STORE G INTO 'emp' USING org.apache.pig.piggybank.storage.DBStorage('com.microsoft.sqlserver.jdbc.SQLServerDriver','jdbc:sqlserver://xxx.x.xx.xx:1433;databaseName=test','username','password','INSERT INTO emp (id,name,email) VALUES (G.id,G.name,G.email)');
... View more
03-29-2018
06:22 AM
Hi, I want to store Pig output to mssql server using DBStorage.I tried with below script, REGISTER /usr/hdp/2.5.0.0-1245/pig/lib/piggybank.jar;
REGISTER /usr/hdp/2.5.0.0-1245/pig/lib/sqljdbc41.jar;
A= LOAD '/user/Employee.csv' USING PigStorage(',') ;
G = FOREACH A GENERATE $0 as id:int,$1 as name:chararray,$2 as email:chararray;
STORE G INTO 'emp' USING org.apache.pig.piggybank.storage.DBStorage('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'jdbc:sqlserver://xxx.x.xx.xx:1433;databaseName=test', 'username', 'password', 'INSERT INTO emp (id,name,email) VALUES (G.id,G.name,G.email)'); But it throws following error : Error: Failure while running task:org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received error from store function.java.lang.RuntimeException: JDBC error
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:148)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422) Input(s): Failed to read data from "/user/Employee.csv" Output(s): Failed to produce result in "hdfs://xxxxxxxxxx:8020/user/root/emp"
... View more
Labels:
- Labels:
-
Apache Pig
03-23-2018
09:50 AM
@Rahul Soni, Hi, I edited the comment.Please check it.
... View more
03-23-2018
05:41 AM
@Rahul Soni, Thanks, Actually its a type mistake.I edited my question and i found that i forgot to close ' ')) '. I want to fetch following values , [/aLog/transaction],POST,[application/vnd.app.v1+json || application/json]
I tried below script, extract = FOREACH matched GENERATE FLATTEN(REGEX_EXTRACT_ALL(logmessage,'^(\\S+)\\s+"(\\{(\\S+),.*=(.*),.*=(.*)\\})"+\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+).*
(t1:chararray,t2:chararray,t3:chararray,t4:chararray,url:chararray,type:chararray,produces:chararray,t5:chararray,t6:chararray,classes:chararray,throw:chararray,exception:chararray); Output : (Mapped,{[/auditConfirmation/businessDates],methods=[GET],produces=[application/vnd.app.v1+json || application/json]},[/auditConfirmation/businessDates],[GET],[application/vnd.app.v1+json || application/json],on
to,public,java.lang.String,com.fhlb.controllers.rest.auditconfirmation.AuditConfirmationRestService.getCloseOFBusinessDates(java.lang.String),throws,com.fhlb.commons.CustomException)
I fetched the output which i want But i am getting one extra schema.Could you help me with regex which extract only expected output.I want to remove "{[/auditConfirmation/businessDates],methods=[GET],produces=[application/vnd.app.v1+json || application/json]}" from the output. I got the Expected output using below script : output = FOREACH extract GENERATE $4 as url,$5 as requesttype,$6 as produces;
... View more
03-22-2018
01:29 PM
Hi, I want to fetch url ,methods and class from the below line using Pig Script: Mapped "{[/aLog/transaction],methods=[POST],produces=[application/vnd.app.v1+json || application/json]}" onto public org.springframework.http.ResponseEntity<java.lang.Object> com.fhlb.user.controller.rest.ALogService.aTransactionDetails(com.fhlb.user.beans.TansactionReportRequest,javax.servlet.http.HttpServletRequest) throws com.fhlb.commons.CustomException,java.io.FileNotFoundException Here is my Pig Script : extract = FOREACH logs_entry GENERATE FLATTEN(REGEX_EXTRACT_ALL(logmessage,'^(Mapped)\\"(\\{+(\\[+([^/].*)+\\]),methods=(\\[+([A-Z].*)+\\]),produces=(\\[+([^ ].*)+\\])+\\}\\)"\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(throws)\\s+(.*)
AS (t1:chararray,url:chararray,type:chararray,produces:chararray,t2:chararray,t3:chararray,classes:chararray,throw:chararray,exception:chararray); But I got below error : ERROR 1200: <line 9, column 229> mismatched input 'AS' expecting RIGHT_PAREN I am not good at regex so please help me to find out the solution. Thanks,
... View more
Labels:
- Labels:
-
Apache Pig
03-15-2018
09:56 AM
Hi, I want to create lineage in atlas, which reads data from AWS S3 process data using pig script and store processsed data into hive table But i can see only hive lineage in atlas. Is it possible to create lineage for pig, S3 and Talend in Atlas? Thanks, Priyal
... View more
Labels:
- Labels:
-
Apache Pig
03-01-2018
07:46 AM
@Geoffrey Shelton Okot @Sharmadha Sainath Actually my Atlas service is running on worker node and i have registered Atlas port to master node.After registering Atlas port 21000 to worker node Atlas UI working fine. Thanks.
... View more
03-01-2018
06:51 AM
@Geoffrey Shelton Okot I have set 777 permission on /var/log/atlas.I was starting Atlas service from Ambari UI $ ps aux | grep -i Atlas atlas 1351 6.3 3.9 5748700 655316 ? Sl 06:36 0:14 /usr/lib/jvm/java/bin/java -Datlas.log.dir=/var/log/atlas -Datlas.log.file=application.log -Datlas.home=/usr/hdp/2.6.1.4-2/atlas -Datlas.conf=/usr/hdp/current/atlas-server/conf -Xms2048m -Xmx2048m -XX:MaxNewSize=600m -XX:MetaspaceSize=100m -XX:MaxMetaspaceSize=512m -server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/atlas/atlas_server.hprof -Xloggc:/var/log/atlas/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -Dlog4j.configuration=atlas-log4j.xml -classpath /usr/hdp/current/atlas-server/conf:/usr/hdp/current/atlas-server/server/webapp/atlas/WEB-INF/classes:/usr/hdp/current/atlas-server/server/webapp/atlas/WEB-INF/lib/*:/usr/hdp/2.6.1.4-2/atlas/libext/*:/etc/hbase/conf org.apache.atlas.Atlas -app /usr/hdp/current/atlas-server/server/webapp/atlas
1003 1763 0.0 0.0 110456 2188 pts/0 S+ 06:40 0:00 grep --color=auto -i Atlas $ netstat -an | grep 21000| grep -i listen tcp 0 0 0.0.0.0:21000 0.0.0.0:* LISTEN I have created HDP 2.6 on AWS.I have registered Atlas port to master Public DNS name and my atlas service is on worker node.I added sample data from worker node successfully.
... View more
03-01-2018
06:40 AM
@Sharmadha Sainath here is the application log : 2018-03-01 06:27:48,736 WARN - [pool-1-thread-1:] ~ Failed to remove shutdown hook (StandardTitanGraph:194)
java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.remove(ApplicationShutdownHooks.java:82)
at java.lang.Runtime.removeShutdownHook(Runtime.java:239)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.removeHook(StandardTitanGraph.java:192)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.shutdown(StandardTitanGraph.java:160)
at org.apache.atlas.repository.graphdb.titan0.Titan0Graph.shutdown(Titan0Graph.java:180)
at org.apache.atlas.web.listeners.GuiceServletConfig.contextDestroyed(GuiceServletConfig.java:177)
at org.eclipse.jetty.server.handler.ContextHandler.callContextDestroyed(ContextHandler.java:808)
at org.eclipse.jetty.servlet.ServletContextHandler.callContextDestroyed(ServletContextHandler.java:457)
at org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:842)
at org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:215)
at org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:529)
at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:143)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:162)
at org.eclipse.jetty.server.handler.AbstractHandler.doStop(AbstractHandler.java:73)
at org.eclipse.jetty.server.Server.doStop(Server.java:456)
at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
at org.apache.atlas.web.service.EmbeddedServer.stop(EmbeddedServer.java:104)
at org.apache.atlas.Atlas.shutdown(Atlas.java:73)
at org.apache.atlas.Atlas.access$100(Atlas.java:42)
at org.apache.atlas.Atlas$1.run(Atlas.java:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2018-03-01 06:27:48,737 INFO - [pool-1-thread-1:] ~ Shutting down log4j (/:2052)
... View more
02-28-2018
01:35 PM
Hi, I have created HDP 2.6 on AWS. I have registered atlas port 21000 to master node and set atlas.server.bind.address to the host on which atlas service is running But Atlas UI is not working.I have loaded sample data to atlas server successfully still Atlas UI not working.
... View more
Labels:
- Labels:
-
Apache Atlas
02-09-2018
08:01 AM
Hi , @Ashutosh Mestry I got the solution.I Added sample data in Atlas server successfully using the below command, sudo su atlas -c '/usr/hdp/current/atlas-server/bin/quick_start.py' Thank you for your help.
... View more
02-07-2018
05:21 AM
Hi @Ashutosh Mestry,
I am not able to truncate the hbase table nor even disable.when i tried to truncate hbase table it throw error , Truncating 'ATLAS_ENTITY_AUDIT_EVENTS' table (it may take a while): ERROR: Unknown table ATLAS_ENTITY_AUDIT_EVENTS!
I have created HDP on AWS. so when i tried to disable hbase table then it throw permission issue. ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'cloudbreak' (action=create) I tried with below command as the atlas user : su atlas -c '/usr/hdp/current/atlas-server/bin/quick_start.py' I got only Atlas UI user(admin) and password(admin).I didn't get which password should i use here.
... View more
02-06-2018
07:55 AM
Hi, I have created HDP on AWS But Atlas web UI is not working.I have installed Atlas,Hbase, Kafka and Ambari Infra (Solr).I tried to load sample model and data using , bin/quick_start.py http://localhost:21000/ But it throw exception: Creating sample types:
Exception in thread "main" org.apache.atlas.AtlasServiceException: Metadata service API org.apache.atlas.AtlasBaseClient$APIInfo@5d534f5d failed with status 409 (Conflict) Response Body ({"errorCode":"ATLAS-409-00-001","errorMessage":"Given type Dimension already exists"})
at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:337)
at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:287)
at org.apache.atlas.AtlasBaseClient.callAPI(AtlasBaseClient.java:429)
at org.apache.atlas.AtlasClientV2.createAtlasTypeDefs(AtlasClientV2.java:217)
at org.apache.atlas.examples.QuickStartV2.createTypes(QuickStartV2.java:191)
at org.apache.atlas.examples.QuickStartV2.runQuickstart(QuickStartV2.java:147)
at org.apache.atlas.examples.QuickStartV2.main(QuickStartV2.java:132)
No sample data added to Apache Atlas Server.
... View more
Labels:
- Labels:
-
Apache Atlas
06-21-2017
07:34 AM
I am using flume on ambari. I want to fetch data from facebook but I am confused in
selection of Avro source and HTTP source.Which source I should take for
fetching data from facebook? Can you please provide example on Avro source and
HTTP source.
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
06-02-2017
06:38 AM
I am using ambari. sqoop import --connect "jdbc:sqlserver://localhost:1433;database=db_name;username=user_name;pasword=
password" --table table_name --target-dir /Sqoop/output --append --incremental append --check-column ID --last-value 100 while executing this command, it loading newly added rows in the existing directory but it shows duplicated records also. how to avoid these duplicated records using sqoop?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Sqoop
06-02-2017
06:37 AM
yarn logs -applicationId application_1496289796598_0013 > appln_logs.txt yarn logs -applicationId application_1496289796598_0013
... View more
06-01-2017
09:46 AM
I am using ambari. I want to export data from HDFS to SQL Server . I have Installed jdbc driver. I have created table in SQL Server which has no primary key. I execute this command, sqoop export --connect "jdbc:sqlserver://localhost:1433;database=db_name;username=user_name;password
=*****" --table table_name --export-dir /SqoopData/output -m 1 But it display message like "Error during export: Export job failed!".
which parameter we should use in export command so that the data loaded into table(which has no primary key) successfully.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Sqoop
05-30-2017
10:58 AM
I am working on sqoop. I want newly added and updated records using single command. Is it possible using --incremental import command?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Sqoop