About aervits

aervits · ‎02-22-2017

Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will demonstrate an awesome feature of WFM to publish and import assets to and from HDFS for easy reuse. What it means is any user on the WFM View node can author an action node and publish it for later user by him/her or another colleague. This allows for collaboration and action repurpose. Consider you author an action node that contains long complicated URL strings or whatnot, save it to HDFS or local database and off you go. Yes, you can publish assets to HDFS as well as local Ambari database. I'll demonstrate HDFS but steps are identical for database. Publishing to HDFS has a benefit of sharing across entire organization rather than single Ambari Views instance. Let's start with creating an email action. Fill out your email address, subject and body. At this point you can submit and see if it works, in my case it worked and I received an email. At this point I'd like to publish this action node to HDFS. It is the last button on the right, you can see that there are two buttons to publish and to import, those are for database and HDFS respectively. Since I've chosen HDFS, I need to supply path. WFM saves assets with .wfasset extension, feel free to share with a global directory. [centos@aervits-hdp0 ~]$ hdfs dfs -ls assets/ Found 1 items -rw-r--r-- 3 centos hdfs 212 2017-02-22 20:56 assets/asset.wfasset Let's now create a new wf where I'm going to execute a Shell action and then reuse this email action to send myself an email when job successfully finishes. Once finished with shell node, let's click on the arrow between shell and end node and click plus to add another action node. In this case, we're going to hit on import asset from HDFS instead of any of the available actions. Then go ahead and enter the HDFS path where asset is located. Once done, change action node names to something that makes sense. Finally, you can execute the wf. Another reason to rename action nodes to meaningful names is ability to identify them quickly in Oozie/WFM dashboard. Notice the Transition column. Finally, let's look at job result and what do you know? I received an email. This is by no means the best way to send email on job result, I think handling job result via email is best handled via decision node or via kill node. This example is meant to demonstrate flexibility of WFM and also show something that is not available in Oozie itself. This is one of many features that will separate Oozie from WFM going forward. Until next time!

aervits · ‎02-22-2017

@Vladislav Falfushinsky by the way, there is an early version of Workflow Manager Technical Preview in Ambari 2.4.2. You can start playing around with it now, keep in mind it's an early release and may contain bugs. Also it is a lot less aestetically appealing than GA version in Ambari 2.5. We haven't added any docs yet but there is a note http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-views/content/ch_workflow_designer_view.html

aervits · ‎02-22-2017

@Gunjan Dhawas Please open this question as a new thread, a good practice on HCC is to create a separate question when an accepted answer is available.

aervits · ‎02-22-2017

@Divakar Annapureddy yes via WebHCAT and WebHDFS # this will execute a hive query and save result to hdfs file in your home directory called output curl -s -d execute="select+*+from+sample_08;" \ -d statusdir="output" \ 'http://localhost:50111/templeton/v1/hive?user.name=root' # if you ls on the directory, it will have two files, stderr and stdout hdfs dfs -ls output # if the job succeeded, you can cat the stdout file and view the results hdfs dfs -cat output/stdout WebHDFS # list the output directory, notice the webhdfs port curl -i "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS" # read the output file curl -i -L "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/stdout?op=OPEN" # rename a file, if you get dr. who error, add &user.name=root or any other user in the context curl -i -X PUT "sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/stdout?op=RENAME&user.name=root&destination=/user/root/newname" # read the output of the new file curl -i -L "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/newname?op=OPEN"

aervits · ‎02-20-2017

The avro in Hive 1.1 is Avro 1.7.4, it does not support date types, those were introduced in Avro 1.8 that explains why they are not showing. You can confirm by looking in your Hive lib directory for specific avro version. https://issues.apache.org/jira/browse/AVRO-739 I haven't tried it myself but if you can coerce sqoop into using avro 1.8.x maybe you can achieve what you want with date type, maybe try adding it to sqoop lib folder? If my answers were at all useful, please consider accepting the answer as best.

aervits · ‎02-20-2017

@Ganeshbabu Ramamoorthy can you remove the double quotes and try again? Also, you'd mentioned CDH, what version of Hive is it? Date type was only introduced in Hive 0.12, chance is CDH does not include it? https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Date/TimeTypes --map-column-hive F_INI_APPEARANCE_DATE=date I also recommend looking at the Java class code generated after you run this job in the directory you execute sqoop from, in there you should be able to find the type it generates.

aervits · ‎02-18-2017

Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will walk you through creating a Sqoop action using WFM on HDP 2.5+. First we need a table, we're going to use MySQL as source database and table. create table imported (rowkey int, value varchar(25)); insert into imported (rowkey, value) values (1, "john doe"); insert into imported (rowkey, value) values (2, "jane doe"); I want to make sure that all cluster nodes can access this table and going to grant access to user centos on the LAN, you may have different restrictions on the network and by all means consult your DBAs. GRANT ALL PRIVILEGES ON *.* TO 'centos'@'172.22.65.%' IDENTIFIED BY 'password' WITH GRANT OPTION; FLUSH PRIVILEGES; GRANT ALL PRIVILEGES ON *.* TO 'centos'@'localhost' IDENTIFIED BY 'password' WITH GRANT OPTION; FLUSH PRIVILEGES; I want to make sure user centos can access the table mysql –u centos –p ➢ password mysql> select * from test.imported; +--------+----------+ | rowkey | value | +--------+----------+ | 1 | john doe | | 2 | jane doe | +--------+----------+ Finally, I'd like to test my sqoop works sqoop list-tables --connect jdbc:mysql://source-1/test --username centos --password password 17/02/18 15:13:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. imported Also, in case of Oozie with HCatalog and Sqoop, every node that will execute job attempts must have HCat and Sqoop client installed. I want to save a password in a file so that I could access it w/out a prompt and not in clear text. echo -n "password" > .password hdfs dfs -put .password /user/$USER/ hdfs dfs -chmod 400 /user/$USER/.password rm .password [centos@source-1 ~]$ hdfs dfs -ls Found 1 items -r-------- 3 centos hdfs 8 2017-02-18 15:13 .password [centos@source-1 ~]$ hdfs dfs -cat .password password[centos@source-1 ~]$ Let's run the list command again referencing the file instead of --password argument sqoop list-tables --connect jdbc:mysql://source-1/test --username centos --password-file /user/centos/.password 17/02/18 15:14:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. imported You can find more details in our comprehensive documentation on data movement http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/sqoop_hcatalog_integration.html Also, make sure mysql-connector-java is at an appropriate version. RHEL 6 bundles version 5.1.17 which does not work in later versions of HDP, we bundle 5.1.37 in HDP-UTILS and the only way to active it is to run the following yum downgrade mysql-connector-java then in your /usr/share/java directory you should be able to see correct connectors lrwxrwxrwx. 1 root root 31 Feb 18 15:29 jdbc-mysql.jar -> mysql-connector-java-5.1.37.jar lrwxrwxrwx. 1 root root 31 Feb 18 15:29 mysql-connector-java.jar -> mysql-connector-java-5.1.37.jar You have a choice to update the Oozie sharelib with this connector or bundle it as part of workflow lib. I'm going to do the latter for time's sake. Before I start authoring a workflow, I'd like to confirm my sqoop import works, I will execute it on the command line first sqoop import --connect jdbc:mysql://172.22.65.123/test --username centos --password-file /user/$USER/.password --table imported --hcatalog-table imported --create-hcatalog-table --hcatalog-storage-stanza "STORED AS ORCFILE" --hcatalog-home /usr/hdp/current/hive-webhcat --map-column-hive value=STRING --split-by rowkey I'm choosing HCatalog import as it is more efficient than --hive-import, in case of latter, it needs to make one extra step of moving imported data from staging directory to hive and spawning an extra container. With --hcatalog-table everything happens in one shot. Also, benefit here is that you can create an ORC table from the command line instead of going into Hive and altering a table to set it to ORC. Let's see what we got as a result Map-Reduce Framework Map input records=2 Map output records=2 Input split bytes=213 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=837 CPU time spent (ms)=8890 Physical memory (bytes) snapshot=718036992 Virtual memory (bytes) snapshot=9154256896 Total committed heap usage (bytes)=535298048 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Transferred 628 bytes in 70.6267 seconds (8.8918 bytes/sec) 17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Retrieved 2 records. 17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners Let's see what it looks like in Hive [centos@source-1 ~]$ beeline Beeline version 1.2.1000.2.6.0.0-493 by Apache Hive beeline> !connect jdbc:hive2://localhost:10000 "" "" Connecting to jdbc:hive2://localhost:10000 Connected to: Apache Hive (version 1.2.1000.2.6.0.0-493) Driver: Hive JDBC (version 1.2.1000.2.6.0.0-493) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:10000> select * from default.imported; +------------------+-----------------+--+ | imported.rowkey | imported.value | +------------------+-----------------+--+ | 1 | john doe | | 2 | jane doe | +------------------+-----------------+--+ 2 rows selected (6.414 seconds) Let's truncate the table in order to prepare for Oozie imports and additionally describe the table to demonstrate it is in fact in ORC 0: jdbc:hive2://localhost:10000> truncate table default.imported; No rows affected (0.4 seconds) 0: jdbc:hive2://localhost:10000> describe formatted imported; | SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL | | InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat I'm ready to start working on a workflow, let's import sqoop action and save workflow to create a directory for it. I want to make sure I have a valid directory so I could upload a few files that are necessary for this to complete successfully. hdfs dfs -mkdir /user/centos/sqoop/lib hdfs dfs -put /usr/share/java/mysql-connector-java-5.1.37.jar /user/centos/sqoop/lib/ hdfs dfs -put /etc/hive/conf/hive-site.xml /user/centos/sqoop/lib/ hdfs dfs -put /etc/tez/conf/tez-site.xml /user/centos/sqoop/lib/ I'm going to use my own MySQL driver than the one in the sharelib and therefore I'm uploading it to my wf, again if you update the sharelib with associated jar you don't have to do that. Secondly, I'm going to include hive-site.xml and tez-site.xml. Until 2.5, you only needed hive-site.xml but now we also need tez-site.xml. It is a small fact that will save you a lot of hours of debugging, trust me I know. Your wf lib directory should look like so hdfs dfs -ls /user/centos/sqoop/lib/ Found 3 items -rw-r--r-- 3 centos hdfs 19228 2017-02-18 15:38 /user/centos/sqoop/lib/hive-site.xml -rw-r--r-- 3 centos hdfs 977873 2017-02-18 15:37 /user/centos/sqoop/lib/mysql-connector-java-5.1.37.jar -rw-r--r-- 3 centos hdfs 6737 2017-02-18 15:38 /user/centos/sqoop/lib/tez-site.xml Finally, I want to modify my sqoop command as I no longer need --create-hcatalog-table command and want to replace $USER argument with my username, you can also use Oozie EL functions for string replacement. import --connect jdbc:mysql://172.22.65.123/test --username centos --password-file /user/centos/.password --table imported --hcatalog-table imported --hcatalog-home /usr/hdp/current/hive-webhcat --map-column-hive value=STRING --split-by rowkey That's what my command will look like in Oozie, notice missing "sqoop" command, it's inferred when you select sqoop action in WFM. Edit the sqoop action on WFM canvas and enter the command in. We are working on refreshing the UI before WFM is released and your dialog box may look slightly different but fields should remain the same. Let's tell WFM that we also expect tez-site.xml and hive-site.xml files Finally, we need to tell Oozie that we will pull in HCatalog and Hive jars for this to work At this point my wf is finished, let's inspect the XML. When you submit the job, it should succeed and you can look at the results. Again, this is more efficient and actually works on HDP 2.5+, I highly recommend checking out WFM and hcatalog options in Sqoop.

aervits · ‎02-17-2017

Take a look here, MariaDB with HDp 2.5.3 for Hive is supported http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/database_requirements.html

aervits · ‎02-17-2017

@john doe use STORED AS ORC instead of RCFILE also, use CTAS to create a table from one table first, then do on the second table INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] (z,y) select_statement1 FROM from_statement; see if that improves performance, I know it's 2 steps but interesting use case.

aervits · ‎02-17-2017

@Baruch AMOUSSOU DJANGBAN check /var/log/ambari-agent/ambari-agent.log on the host that failed and /var/log/ambari-server/ambari-server.log on the ambari server node.

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Apache Ambari Workflow Manager View for Apache Ooz...

Re: Apache Ambari 2.5

Re: Hive STRING vs VARCHAR Performance

Re: Can Hadoop/Hive support data access over a RES...

Re: --map-column hive not creating specific column...

Re: --map-column hive not creating specific column...

Apache Ambari Workflow Manager View for Apache Ooz...

Re: Can MariaDB be used with Hive, on HDP 2.5.3?

Re: Importing data from two tables to a single tab...

Re: Registering your hosts failed ambari