Member since
07-04-2016
63
Posts
141
Kudos Received
10
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
640 | 01-23-2018 11:47 AM | |
1682 | 01-02-2018 02:01 PM | |
1219 | 01-02-2018 12:23 PM | |
551 | 12-26-2017 05:09 PM | |
480 | 06-23-2017 08:59 AM |
08-06-2018
04:07 AM
Aravind Yarram Which version of HDP/Zeppelin are you using ? In the mean time, as a work around, can you try to restart zeppelin server and see if it solves the issue ?
... View more
06-21-2018
01:32 AM
Venkat You should be getting both header and data with this command. I have just added "hive.cli.print.header=true" to print header along with data. hive -e 'set hive.cli.print.header=true; select * from your_Table' | sed 's/[\t]/,/g' > /home/yourfile.csv Whats the result you are seeing if you just do "select * from your_Table"? Does the table have the data?
... View more
06-20-2018
05:33 AM
1 Kudo
@Abhinav Kumar Spark submit is the right way to submit the spark application as spark-submit sets up the correct classpaths for you. If you are running it as java program, then you need to take care of all these setups which would become tricky. And I guess even now you are facing the issue due to incorrect jars in classpath. Also please help me understand the usecase ? Whats the purpose of launching spark job using yarn rest api ? If you do simple spark-submit, it will take care of negotiating resources from YARN and to run the application.
... View more
06-20-2018
02:52 AM
2 Kudos
@Mr. Davy Jones I found this article to be useful to solve above problem efficient way https://ragrawal.wordpress.com/2015/08/25/pyspark-top-n-records-in-each-group/
... View more
06-20-2018
02:32 AM
1 Kudo
Abhinav Kumar How are you submitting the the spark job? are you doing "spark-submit" ? Please provide the command used to submit the job.
... View more
06-18-2018
05:18 AM
2 Kudos
Venkat Please try this : hive -e 'set hive.cli.print.header=true; select * from your_Table' | sed 's/[\t]/,/g' > /home/yourfile.csv Original answer https://stackoverflow.com/questions/17086642/how-to-export-a-hive-table-into-a-csv-file
... View more
02-19-2018
09:22 AM
@rmr1989 Since you say you dont see anything in server logs, I suspect it to be some intermittent network issue. As a workaround can try 1) "refreshing the page" and if it doesn't work can you try 2) "restarting ambari server".
... View more
02-19-2018
09:17 AM
I havn't used R studio. But if you are looking for a notebook to launch spark jobs, then you can give a try to Apache Zeppelin. HDP comes bundled with zeppelin and you can install it as a service on the cluster. Please use https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_zeppelin-component-guide/content/ch_installation.html to install zeppelin using ambari. Once you have zeppelin, you can use 'R interpreter' and start interacting with spark. Steps to configure R interpreter : https://zeppelin.apache.org/docs/0.6.2/interpreter/r.html
... View more
02-19-2018
09:02 AM
2 Kudos
@Anurag Mishra Please find the AMS architecture in below docs : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.1.0/bk_ambari-operations/content/ams_architecture.html https://cwiki.apache.org/confluence/display/AMBARI/Metrics As you can see 'metrics collector' is a daemon which gathers metrics from 'Metrics Monitors' and 'Hadoop Sinks'. So as far as 'Metric Collector' is concerned , 'Metrics Monitors' and 'Hadoop Sinks' are the datasources for it. Metrics monitor log could be found under /grid/0/log/metric_monitor/ambari-metrics-monitor.out But I am not sure if metric monitor and hadoop sync are writing these metric data into any file(As far as I know it doesnt)
... View more
01-31-2018
09:44 AM
1 Kudo
@Gerald BIDAULT
Is it feasible to install python2.7 on your centos6 cluster ? If you can install python2.7, then modify spark-env.sh to use python2.7 by changing below properties : export PYSPARK_PYTHON=<path to python 2.7>
export PYSPARK_DRIVER_PYTHON=python2.7
Steps for changing spark-env.sh : 1) Login to ambari 2) Navigate to spark service 3) Under 'Advanced spark2-env' modify 'content' to add properties as described above. Attaching screenshot.spark-changes.png
... View more
01-31-2018
09:07 AM
1 Kudo
@Long M Can you try ambari quick link to access the zeppelin UI. Attaching the screenshot so that its more evident. zeppelin-quicklink.jpg
... View more
01-31-2018
08:49 AM
1 Kudo
@Michael Bronson Please refer https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-Step1:CreateBlueprint to get all the required details regarding blueprints
... View more
01-31-2018
08:41 AM
2 Kudos
@Gerald BIDAULT I guess this is not possible. If you have two different versions of spark then application will fail with exception "Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set." You can also refer this question : https://community.hortonworks.com/questions/101952/zeppelin-pyspark-cannot-run-with-different-minor-v.html
... View more
01-23-2018
12:35 PM
1 Kudo
@Ravikiran Dasari Please accept the answer if it addresses your query 🙂 or let me know if you need any further information.
... View more
01-23-2018
11:47 AM
3 Kudos
@Ravikiran Dasari Yes. Ambari 2.5.2 does have Oozie view which is also called workflow designer. Please follow the steps to enable the view https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_workflow-management/content/config_wfm_view.html Also use this https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html to get started with workflow designer.
... View more
01-02-2018
02:01 PM
3 Kudos
@Michael Bronson I guess you would be using curl, so providing the example in curl. This one is for the PUT call. [root@ctr-e136-1513029738776-28711-01-000002 ~]# curl -XPUT -u admin:admin --header X-Requested-By:ambari http://172.27.67.14:8080/api/v1/clusters/cl1/hosts/ctr-e136-1513029738776-28711-01-000002.hwx.site/host_components -d ' {"RequestInfo":{"context":"Stop All Host Components","operation_level":{"level":"HOST","cluster_name":"cl1","host_names":"ctr-e136-1513029738776-28711-01-000002.hwx.site"},"query":"HostRoles/component_name.in(JOURNALNODE,SPARK_JOBHISTORYSERVER)"},"Body":{"HostRoles":{"state":"STARTED"}}}' "http://<ambari_server_host>:8080/api/v1/clusters/cl1/hosts/<host_name>/host_components" this is a GET call and this doesn't require any request body.
... View more
01-02-2018
12:23 PM
6 Kudos
@Michael Bronson You can use configs.py to achieve this. Run below command on ambari server host /var/lib/ambari-server/resources/scripts/configs.py --action get --host localhost --port <ambari_server_host> --protocol <ambari_protocol> --cluster <cluster_name> --config-type yarn-site (/var/lib/ambari-server/resources/scripts/configs.py --action get --host localhost --port 8080 --protocol http --cluster cl1 --config-type yarn-site) This will return the results in JSON format which are key value pairs. You can use this result to find the value of yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs Example : [root@ctr-e136-1513029738776-28711-01-000002 ~]# /var/lib/ambari-server/resources/scripts/configs.py --action get --host localhost --port 8080 --protocol http --cluster cl1 --config-type yarn-site 2018-01-02 12:14:46,879 INFO ### Performing "get" content: 2018-01-02 12:14:46,902 INFO ### on (Site:yarn-site, Tag:82207fb3-2c26-47fb-a092-d0b88e19fa66) { "properties": { "yarn.rm.system-metricspublisher.emit-container-events": "true", "yarn.timeline-service.http-authentication.kerberos.keytab": "/etc/security/keytabs/spnego.service.keytab", "yarn.timeline-service.http-authentication.signer.secret.provider.object": "", "yarn.resourcemanager.hostname": "ctr-e136-1513029738776-28711-01-000004.hwx.site", "yarn.node-labels.enabled": "false", "yarn.resourcemanager.scheduler.monitor.enable": "false", "yarn.nodemanager.aux-services.spark2_shuffle.class": "org.apache.spark.network.yarn.YarnShuffleService", "yarn.timeline-service.http-authentication.signature.secret.file": "", "yarn.timeline-service.bind-host": "0.0.0.0", "hadoop.registry.secure": "true", "yarn.resourcemanager.ha.enabled": "true", "hadoop.registry.dns.bind-port": "5353", "yarn.nodemanager.runtime.linux.docker.privileged-containers.acl": "", "yarn.timeline-service.webapp.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8188", "yarn.nodemanager.principal": "nm/_HOST@EXAMPLE.COM", "yarn.timeline-service.enabled": "false", "yarn.nodemanager.recovery.enabled": "true", "yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath": "{\"HDP\":\"/usr/hdp\"}/${hdp.version}/spark/hdpLib/*", "yarn.timeline-service.http-authentication.type": "kerberos", "yarn.nodemanager.container-metrics.unregister-delay-ms": "60000", "yarn.nodemanager.keytab": "/etc/security/keytabs/nm.service.keytab", "yarn.timeline-service.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:10200", "yarn.timeline-service.entity-group-fs-store.summary-store": "org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore", "yarn.timeline-service.entity-group-fs-store.app-cache-size": "10", "yarn.nodemanager.aux-services.spark2_shuffle.classpath": "{{stack_root}}/${hdp.version}/spark2/aux/*", "yarn.resourcemanager.webapp.spnego-principal": "HTTP/_HOST@EXAMPLE.COM", "yarn.resourcemanager.am.max-attempts": "20", "\nyarn.webapp.api-service.enable\n": "true", "yarn.nodemanager.log-aggregation.debug-enabled": "false", "yarn.timeline-service.http-authentication.proxyuser.*.users": "", "yarn.timeline-service.http-authentication.proxyuser.*.hosts": "", "yarn.scheduler.maximum-allocation-vcores": "1", "yarn.resourcemanager.system-metrics-publisher.enabled": "true", "yarn.nodemanager.vmem-pmem-ratio": "2.1", "yarn.resourcemanager.nodes.exclude-path": "/etc/hadoop/conf/yarn.exclude", "yarn.timeline-service.http-authentication.cookie.path": "", "yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size": "10", "yarn.log.server.url": "http://ctr-e136-1513029738776-28711-01-000004.hwx.site:19888/jobhistory/logs", "yarn.nodemanager.webapp.spnego-principal": "HTTP/_HOST@EXAMPLE.COM", "yarn.timeline-service.keytab": "/etc/security/keytabs/yarn.service.keytab", "\nyarn.nodemanager.runtime.linux.docker.allowed-container-networks\n": "host,none,bridge", "yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled": "false", "hadoop.registry.dns.domain-name": "hwx.site", "yarn.timeline-service.entity-group-fs-store.active-dir": "/ats/active/", "\nyarn.nodemanager.runtime.linux.docker.default-container-network\n": "host", "yarn.resourcemanager.principal": "rm/_HOST@EXAMPLE.COM", "yarn.nodemanager.local-dirs": "/grid/0/hadoop/yarn/local", "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage": "false", "yarn.nodemanager.remote-app-log-dir-suffix": "logs", "yarn.log.server.web-service.url": "http://ctr-e136-1513029738776-28711-01-000004.hwx.site:8188/ws/v1/applicationhistory", "\nyarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users\n": "false", "yarn.resourcemanager.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8050", "yarn.resourcemanager.zk-num-retries": "1000", "yarn.timeline-service.http-authentication.token.validity": "", "yarn.resourcemanager.ha.automatic-failover.zk-base-path": "/yarn-leader-election", "yarn.resourcemanager.proxy-user-privileges.enabled": "true", "yarn.application.classpath": "$HADOOP_CONF_DIR,{{hadoop_home}}/*,{{hadoop_home}}/lib/*,{{stack_root}}/current/hadoop-hdfs-client/*,{{stack_root}}/current/hadoop-hdfs-client/lib/*,{{stack_root}}/current/hadoop-yarn-client/*,{{stack_root}}/current/hadoop-yarn-client/lib/*", "yarn.timeline-service.ttl-ms": "2678400000", "yarn.timeline-service.http-authentication.proxyuser.ambari-server.hosts": "ctr-e136-1513029738776-28711-01-000002.hwx.site", "yarn.nodemanager.container-monitor.interval-ms": "3000", "yarn.node-labels.fs-store.retry-policy-spec": "2000, 500", "yarn.resourcemanager.zk-acl": "sasl:rm:rwcda", "yarn.timeline-service.leveldb-state-store.path": "/grid/0/hadoop/yarn/timeline", "hadoop.registry.jaas.context": "Client", "yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled": "false", "yarn.resourcemanager.webapp.https.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8088", "yarn.log-aggregation-enable": "true", "yarn.nodemanager.delete.debug-delay-sec": "3600", "yarn.resourcemanager.bind-host": "0.0.0.0", "yarn.timeline-service.store-class": "org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore", "yarn.resourcemanager.webapp.spnego-keytab-file": "/etc/security/keytabs/spnego.service.keytab", "yarn.timeline-service.client.retry-interval-ms": "1000", "yarn.system-metricspublisher.enabled": "true", "yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes": "org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl", "hadoop.registry.zk.quorum": "ctr-e136-1513029738776-28711-01-000007.hwx.site:2181,ctr-e136-1513029738776-28711-01-000003.hwx.site:2181,ctr-e136-1513029738776-28711-01-000006.hwx.site:2181,ctr-e136-1513029738776-28711-01-000005.hwx.site:2181", "yarn.nodemanager.aux-services": "mapreduce_shuffle", "\nyarn.nodemanager.runtime.linux.allowed-runtimes\n": "default,docker", "yarn.timeline-service.http-authentication.proxyuser.ambari-server.groups": "*", "yarn.nodemanager.aux-services.mapreduce_shuffle.class": "org.apache.hadoop.mapred.ShuffleHandler", "hadoop.registry.dns.enabled": "true", "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage": "90", "yarn.resourcemanager.zk-timeout-ms": "10000", "yarn.resourcemanager.fs.state-store.uri": " ", "yarn.nodemanager.linux-container-executor.group": "hadoop", "yarn.nodemanager.remote-app-log-dir": "/app-logs", "yarn.nodemanager.aux-services.spark_shuffle.classpath": "{{stack_root}}/${hdp.version}/spark/aux/*", "yarn.resourcemanager.keytab": "/etc/security/keytabs/rm.service.keytab", "yarn.timeline-service.ttl-enable": "true", "yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds": "3600", "yarn.resourcemanager.fs.state-store.retry-policy-spec": "2000, 500", "yarn.timeline-service.generic-application-history.store-class": "org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore", "yarn.resourcemanager.webapp.address.rm1": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8088", "hadoop.registry.dns.zone-mask": "255.255.255.0", "yarn.nodemanager.disk-health-checker.min-healthy-disks": "0.25", "yarn.resourcemanager.state-store.max-completed-applications": "${yarn.resourcemanager.max-completed-applications}", "yarn.resourcemanager.webapp.address.rm2": "ctr-e136-1513029738776-28711-01-000003.hwx.site:8088", "yarn.resourcemanager.work-preserving-recovery.enabled": "true", "yarn.resourcemanager.resource-tracker.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8025", "yarn.nodemanager.health-checker.script.timeout-ms": "60000", "yarn.resourcemanager.scheduler.class": "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler", "yarn.nodemanager.resource.memory-mb": "12288", "yarn.timeline-service.http-authentication.kerberos.name.rules": "", "yarn.nodemanager.resource.cpu-vcores": "1", "yarn.timeline-service.http-authentication.signature.secret": "", "yarn.scheduler.maximum-allocation-mb": "12288", "yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round": "0.17", "yarn.nodemanager.resource.percentage-physical-cpu-limit": "80", "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb": "1000", "yarn.resourcemanager.proxyuser.*.groups": "", "yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds": "3600", "yarn.timeline-service.principal": "yarn/_HOST@EXAMPLE.COM", "yarn.timeline-service.state-store-class": "org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore", "yarn.node-labels.fs-store.root-dir": "/system/yarn/node-labels", "yarn.resourcemanager.hostname.rm1": "ctr-e136-1513029738776-28711-01-000004.hwx.site", "yarn.resourcemanager.hostname.rm2": "ctr-e136-1513029738776-28711-01-000003.hwx.site", "yarn.resourcemanager.proxyuser.*.hosts": "", "yarn.resourcemanager.webapp.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8088", "yarn.scheduler.minimum-allocation-vcores": "1", "yarn.nodemanager.health-checker.interval-ms": "135000", "yarn.nodemanager.admin-env": "MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX", "yarn.nodemanager.vmem-check-enabled": "false", "yarn.acl.enable": "true", "yarn.timeline-service.leveldb-timeline-store.read-cache-size": "104857600", "yarn.nodemanager.log.retain-seconds": "604800", "yarn.client.nodemanager-connect.max-wait-ms": "60000", "yarn.timeline-service.http-authentication.simple.anonymous.allowed": "true", "\nyarn.nodemanager.runtime.linux.docker.privileged-containers.allowed\n": "false", "yarn.scheduler.minimum-allocation-mb": "1024", "yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size": "10000", "yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor": "1", "yarn.resourcemanager.ha.rm-ids": "rm1,rm2", "yarn.timeline-service.http-authentication.signer.secret.provider": "", "yarn.resourcemanager.connect.max-wait.ms": "900000", "yarn.resourcemanager.proxyuser.*.users": "", "yarn.timeline-service.http-authentication.cookie.domain": "", "yarn.timeline-service.http-authentication.proxyuser.*.groups": "", "yarn.http.policy": "HTTP_ONLY", "yarn.nodemanager.runtime.linux.docker.capabilities": "\nCHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,\nSETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE", "yarn.timeline-service.version": "2.0", "yarn.resourcemanager.zk-address": "ctr-e136-1513029738776-28711-01-000007.hwx.site:2181,ctr-e136-1513029738776-28711-01-000006.hwx.site:2181,ctr-e136-1513029738776-28711-01-000005.hwx.site:2181", "yarn.nodemanager.recovery.dir": "{{yarn_log_dir_prefix}}/nodemanager/recovery-state", "yarn.nodemanager.container-executor.class": "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor", "yarn.resourcemanager.store.class": "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore", "yarn.timeline-service.entity-group-fs-store.retain-seconds": "604800", "yarn.nodemanager.webapp.spnego-keytab-file": "/etc/security/keytabs/spnego.service.keytab", "yarn.resourcemanager.recovery.enabled": "true", "yarn.timeline-service.leveldb-timeline-store.path": "/grid/0/hadoop/yarn/timeline", "hadoop.registry.system.accounts": "sasl:yarn,sasl:jhs,sasl:hdfs,sasl:rm,sasl:hive", "yarn.timeline-service.client.max-retries": "30", "yarn.resourcemanager.scheduler.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8030", "yarn.log-aggregation.retain-seconds": "2592000", "yarn.nodemanager.address": "0.0.0.0:25454", "hadoop.registry.rm.enabled": "false", "yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms": "300000", "yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms": "10000", "yarn.resourcemanager.zk-state-store.parent-path": "/rmstore", "yarn.nodemanager.log-aggregation.compression-type": "gz", "yarn.timeline-service.http-authentication.kerberos.principal": "HTTP/_HOST@EXAMPLE.COM", "yarn.nodemanager.log-aggregation.num-log-files-per-app": "30", "hadoop.registry.client.auth": "kerberos", "yarn.timeline-service.recovery.enabled": "true", "yarn.nodemanager.bind-host": "0.0.0.0", "yarn.resourcemanager.zk-retry-interval-ms": "1000", "manage.include.files": "false", "yarn.nodemanager.recovery.supervised": "true", "yarn.admin.acl": "yarn,dr.who", "yarn.resourcemanager.cluster-id": "yarn-cluster", "yarn.nodemanager.log-dirs": "/grid/0/hadoop/yarn/log", "yarn.timeline-service.entity-group-fs-store.scan-interval-seconds": "60", "yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size": "10000", "yarn.nodemanager.aux-services.spark_shuffle.class": "org.apache.spark.network.yarn.YarnShuffleService", "hadoop.registry.dns.zone-subnet": "172.17.0.0", "yarn.client.nodemanager-connect.retry-interval-ms": "10000", "yarn.resourcemanager.admin.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8141", "yarn.timeline-service.webapp.https.address": "ctr-e136-1513029738776-28711-01-000004.hwx.site:8190", "yarn.resourcemanager.connect.retry-interval.ms": "30000", "yarn.timeline-service.entity-group-fs-store.done-dir": "/ats/done/" } }
... View more
01-02-2018
10:10 AM
3 Kudos
@Michael Bronson use below api to stop all the components on a host : PUT : http://<ambari_server_host>:8080/api/v1/clusters/cl1/hosts/<host_name>/host_components Body : {"RequestInfo":{"context":"Stop All Host Components","operation_level":{"level":"HOST","cluster_name":"cl1","host_names":"<host_name>"},"query":"HostRoles/component_name.in(DATANODE,HBASE_REGIONSERVER,JOURNALNODE,METRICS_MONITOR,NFS_GATEWAY,SPARK_JOBHISTORYSERVER)"},"Body":{"HostRoles":{"state":"INSTALLED"}}} use below api to start all the components on a host : PUT : http://<ambari_server_host>:8080/api/v1/clusters/cl1/hosts/<host_name>/host_components Body : {"RequestInfo":{"context":"Start All Host Components","operation_level":{"level":"HOST","cluster_name":"cl1","host_names":"<host_name>"},"query":"HostRoles/component_name.in(DATANODE,HBASE_REGIONSERVER,JOURNALNODE,METRICS_MONITOR,NFS_GATEWAY,SPARK_JOBHISTORYSERVER)"},"Body":{"HostRoles":{"state":"STARTED"}}} Note : Here "HostRoles/component_name.in" should be replaced with components on a specific host. You can obtain this using the api : http://<ambari_server_host>:8080/api/v1/clusters/cl1/hosts/<host_name>?fields=host_components If you are planning to write a script to automatically start/stop components on a DATANODE host then you can follow below steps : 1) Use GET http://<ambari_server_host>:8080/api/v1/clusters/cl1/services/HDFS/components/DATANODE/?fields=host_components to find all the hosts which have DATANODE installed. 2) Then use the PUT apis to STOP/START(as mentioned above) in a loop to perform the required operation on all the respective hosts. Please let me know if you have any questions.
... View more
12-26-2017
05:09 PM
5 Kudos
@stanley tsao Please find the steps to enable AMS HA : https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-operations/content/ams_high_availability.html. Steps to enable distributed mode : https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode
... View more
09-19-2017
09:52 AM
7 Kudos
@Andy Liang If you are using ambari, then you can refer to 'Admin > Stack and Version' page (<ambari_url>/#/main/admin/stack/services) to know the version for respective service. Attaching the screenshot.stack-version.jpg
... View more
09-19-2017
09:40 AM
6 Kudos
@Mrinmoy Choudhury Tpch are relatively simpler and you can use https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-build.sh and https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-setup.sh to build and set up the data. Also https://catalog.data.gov/dataset/college-scorecard has good amount of data which you can use to populate hive tables.
... View more
09-19-2017
09:13 AM
6 Kudos
@Ajit Sonawane Which version of ambari are you using ? What steps did you follow to kerberize the cluster? I hope you have followed the steps as mentioned in https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/configuring_amb_hdp_for_kerberos.html. If not please follow the doc to kerberize the cluster. If you have followed the steps in the doc, then can you check /etc/security/keytabs/ and ensure that all the service keytabs are generated?
... View more
06-23-2017
08:59 AM
2 Kudos
@Pradarttana Panda If you are looking this for your testing purpose and want to ensure that the operation succeeds every time, then you can use the action to refresh the client configs. ie you can perform the 'Refresh Configs' operation on slider or Kerberos components and it will succeed in mejority of the cases.
... View more
05-08-2017
09:06 AM
12 Kudos
In this article I will be explaining about creating oozie custom action node to clone a git repository to a required file path. Git repo location and file path will be taken as input while running the workflow. And we will use ambari's workflow manager view to create a workflow using newly created action node. If you are new to workflow manager, then this would be a good starting point. Details about custom action nodes could be found in oozie docs here. We need to follow some prerequisites before we jump to workflow manager view and start using custom action nodes. Step 1 : Implementing Oozie custom action handler : For this article, I will be creating a custom action which clones a git repository . This implementation extends ActionExecutor class (provided by Oozie) and overrides the required methods. This implementation follows Oozie documentation and implements all required methods. import java.io.File;
import org.apache.oozie.ErrorCode;
import org.apache.oozie.action.ActionExecutor;
import org.apache.oozie.action.ActionExecutorException;
import org.apache.oozie.action.ActionExecutorException.ErrorType;
import org.apache.oozie.client.WorkflowAction;
import org.apache.oozie.util.XmlUtils;
import org.eclipse.jgit.api.Git;
import org.jdom.Element;
import org.jdom.Namespace;
public class GitActionExecutor extends ActionExecutor {
private static final String NODENAME = "git";
private static final String SUCCEEDED = "OK";
private static final String FAILED = "FAIL";
private static final String KILLED = "KILLED";
public GitActionExecutor() {
super(NODENAME);
}
@Override
public void check(Context context, WorkflowAction action)
throws ActionExecutorException {
// Should not be called for synch operation
throw new UnsupportedOperationException();
}
@Override
public void end(Context context, WorkflowAction action)
throws ActionExecutorException {
String externalStatus = action.getExternalStatus();
WorkflowAction.Status status = externalStatus.equals(SUCCEEDED) ? WorkflowAction.Status.OK
: WorkflowAction.Status.ERROR;
context.setEndData(status, getActionSignal(status));
}
@Override
public boolean isCompleted(String arg0) {
return true;
}
@Override
public void kill(Context context, WorkflowAction action)
throws ActionExecutorException {
context.setExternalStatus(KILLED);
context.setExecutionData(KILLED, null);
}
@Override
public void start(Context context, WorkflowAction action)
throws ActionExecutorException {
// Get parameters from Node configuration
try {
Element actionXml = XmlUtils.parseXml(action.getConf());
Namespace ns = Namespace
.getNamespace("uri:custom:git-action:0.1");
String repository = actionXml.getChildTextTrim("repository", ns);
File filePath = new File(actionXml.getChildTextTrim("hdfsPath", ns));
cloneRepo(repository, filePath);
context.setExecutionData(SUCCEEDED, null);
} catch (Exception e) {
context.setExecutionData(FAILED, null);
throw new ActionExecutorException(ErrorType.FAILED,
ErrorCode.E0000.toString(), e.getMessage());
}
}
// Sending an email
public void cloneRepo(String repository, File filePath) throws Exception {
Git.cloneRepository()
.setURI(repository)
.setDirectory(filePath)
.call();
}
}
No arguments constructor is required for any custom actions handler. This constructor registers the action handler name (invoking super with the action name) that will be used inside workflow XML. InitActionType method can be used to register possible exceptions while executing the action, along with their type and error message and do initial initialization for the executor itself. Start method is used to start action execution. Because we have implemented synchronous action the whole action is executed here. This method is invoked by Oozie with two parameters Context and WorkflowAction. Context provides access to Oozie workflow execution context which, among other things contains workflow variables and provides very simple APIs (set, get) for manipulating them. WorkflowAction provides Oozie’ definition of the current action. Check method, which is used by Oozie to check action’s status. Should never be called for synchronous actions. Kill method which is used to kill the running job or action. End method is used for any cleanup or processing which may need to be done after completion of the action. It also has to set the result of the execution. Step2 : Define XML schema for newly created email component : <?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:git="uri:custom:git-action:0.1"
elementFormDefault="qualified"
targetNamespace="uri:custom:git-action:0.1">
<xs:complexType name="GIT">
<xs:sequence>
<xs:element name="repository" type="xs:string" />
<xs:element name="filePath" type="xs:string" />
</xs:sequence>
</xs:complexType>
<xs:element name="git" type="git:GIT"></xs:element>
</xs:schema>
This takes 'repository' and 'filePath' as mandatory input value before running the workflow. Step 3 : Register information about custom executor with Oozie runtime. This is done by extending oozie-site.xml Add oozie.service.ActionService.executor.ext.classes=GitActionExecutor Step 4 : Add XML schema for the new Actions. Add oozie.service.SchemaService.wf.ext.schemas=gitAction.xsd Step 5 : Package action code and XML schema into a single jar file and upload the jar file to location '/usr/hdp/current/oozie-server/libext' and restart oozie server. Step 6 : Now navigate to workflow manager view and open a new workflow window. Step 7 : Select new custom action node from action node list. Step 8 : Define the custom XML with path to git repository and file path where repository should be cloned. <git
xmlns="uri:custom:git-action:0.1">
<repository>https://github.com/cartershanklin/structor.git</repository>
<filePath>/tmp/newDir/</filePath>
</git>
Step 9 : Preview the workflow xml and see that workflow is created with custom action node 'git'. Step 10 : As everything is setup, we can now run this workflow. On successful completion of the workflow run, git repo will be cloned to given input path /tmp/newDir/ This project could be downloaded from https://github.com/ssharma555/oozie-git-clone.git Reference : https://www.infoq.com/articles/ExtendingOozie
... View more
- Find more articles tagged with:
- ambari-views
- Cloud & Operations
- How-ToTutorial
- workflo
Labels:
04-10-2017
10:19 AM
2 Kudos
@mayki wogno From error logs, it looks like caused due to known bug https://issues.apache.org/jira/browse/AMBARI-19207 (This issue is faced across different views very intermittently). In that case as a work around, please create a new workflow manager view instance and check if you are able to submit the workflow. Another thing which I observed is, you need not have to provide 'WebHDFS authorization' property while creating the views. (This was needed only till ambari-2.2.2)
... View more
03-27-2017
06:35 AM
1 Kudo
@Anand Umbare There's mongodb connector available for hadoop. The MongoDB Connector for Hadoop is a plugin for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination. Please refer : https://docs.mongodb.com/ecosystem/tools/hadoop/ For connecting to Hive : https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
... View more
03-21-2017
06:05 AM
1 Kudo
@Nelson KA Rajendran Define the table using 'external' keyword which leaves the files in place, but creates the table definition in the hive metastore. create external table test_daily_load_table ( id int, myfields string )
row format delimited fields terminated by ','
location '/user/demo/path/';
Also refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable for create table ddl.
... View more
03-21-2017
03:14 AM
@Joseph Hawkins Can you please tell me the action/operation which you are trying ?
... View more
03-21-2017
03:08 AM
1 Kudo
@Elvis Zhang This could be due to system slowness. Can you please try by refreshing the browser? (Your previous actions would be retained by ambari and it will load 'Configure Identities' page). Also please tail ambari server log (/var/log/ambari-server/ambari-server.log) while doing the refresh and check if you are seeing any error in the log.
... View more
03-20-2017
09:26 AM
4 Kudos
@Nandini Bhattacharjee Can you try writing Hive UDF? https://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java describes creating UDF for generating sequence of numbers. You can follow the same approach to generate series of dates. Also refer : https://community.hortonworks.com/questions/20168/sequence-number-generation-in-hive.html
... View more