About dconnolly

dconnolly · ‎12-16-2016

These links from previous questions may help https://community.hortonworks.com/questions/58916/utf-8-hive.html https://community.hortonworks.com/articles/58548/processing-files-in-hive-using-native-non-utf8-cha.html

dconnolly · ‎12-16-2016

Quick question, curious of perspective, does it make sense to use falcon snapshot support to just manage the snapshots for a single cluster and not necessarily the DR replication aspects.

dconnolly · ‎09-29-2016

Nice step by step, will save a bunch of time for people

dconnolly · ‎09-26-2016

Hi Vasilis, thanks for getting topic going. An article that would help for sure, and maybe I can help when I get back, is showing the taxonomy aspects and adding terms to assets. The earlier versions of tech preview had samples already running.

dconnolly · ‎09-10-2016

Hi, having same issue, followed steps in jira https://issues.apache.org/jira/browse/AMBARI-18046?jql=project%20%3D%20AMBARI and still getting This host-level alert is triggered if the Falcon Server Web UI is unreachable - 503. Any insights on what you did to get around this. I am running ambari 2.4 on HDP 2.5 tech preview. Steps from Jira listed here. STR: 1) Upgrade Ambari from 2.2.1 to 2.4.0 2) Delete Falcon 3) Add Falcon Result: Falcon UI is unavailable. From Falcon logs: java.lang.RuntimeException: org.apache.falcon.FalconException: Unable to get instance for org.apache.falcon.atlas.service.AtlasService

dconnolly · ‎09-01-2016

Self Service Hadoop – well some starting points Say you want to get started with Big Data and concurrently want to start to empower your relatively savvy end users that have been in the frustrating desktop data management land for a long time. This very simple article will hopefully help with a few options. This diagram below gives a little perspective around some of the sources and mechanisms around ingesting, manipulating and using your data assets. You can see following the numbers from 1 to 6 that you have many options with working with your data. This article will concentrate on just showing a simple end user example of ingesting data, understanding it and using options for using the data from a self service approach. Really an approach to get started and help promote some of the value of a modern data architecture as your team matures. An end users want to self service some data from their desktop/server into HDFS and be able to query and understand that data from their existing tools as well as work with it in conjunction with what tech staff is ingesting using other vehicles. This quick example will show how to use the Ambari Hive view to upload data, provide some structure, and create a Hive table that can be used by many available tools. Will also give a very brief starting thought around how Atlas can be used to help organize and track the what, where, how, etc. around your assets. 1.Go to Ambari Hive view – right side of Ambari dashboard on the top lists the views when you click on the table looking icon. ( There are also other views for HDFS file view, Zeppelin, etc.) Here is the Where you select Ambari View 2.Once in the Ambari view, you can click on the upload table tab. This is what the Ambari View looks like, lot of options here, some more tech focused, but very functional. 3.Within that tab you can select a CSV, with or without headers, from local storage or HDFS. 4.Then you can change the column names and/or types if necessary. 5.Then you create the hive table, in the Hive database you want. This is the Table tab where I selected a CSV (geolocation) from my hard drive, it had headers 6.Once the Hive table is created you can use any third party tool (tableau), ambari hive view, excel, zeppelin, etc. to work with the table. Here is the Hive table geolocation (stored in ORC format) in default Hive Database queried in hive view 7.Ok, one more detail that may help you. Once the geolocation table is created from the Hive View upload, there is no reason why you cannot go out and tie it into a taxonomy in Atlas, tag columns, add details, see lineage, etc. Few screen prints to give perspective. This is a larger topic, but will help locate, organize, secure, and track data assets for the team. Bottom part of atlas screen. A good understanding of the latest Atlas release can be found in the Hadoop Summit presentations listed below. Atlas – three session at Hadoop summit that will help. This is the link to all the sessions if interested. http://hadoopsummit.org/san-jose/agenda/ a. What the #$* is a Business Catalog and why you need it Video - https://www.youtube.com/watch?v=BtAkztkcZwU Slides - http://www.slideshare.net/HadoopSummit/what-the-is-a-business-catalog-and-why-you-need-it ‪b. Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise Video - https://www.youtube.com/watch?v=ID6qnoLCQzk Slides - http://www.slideshare.net/HadoopSummit/top-three-big-data-governance-issues-and-how-apache-atlas-resolves-it-for-the-enterprise ‪c. Extend Governance in Hadoop with Atlas Ecosystem Video - https://www.youtube.com/watch?v=7nx6hzhM4Xs Slides - http://www.slideshare.net/HadoopSummit/extend-governance-in-hadoop-with-atlas-ecosystem-waterline-attivo-trifacta

dconnolly · ‎07-07-2016

Hi Josh, how does this tie in - https://issues.apache.org/jira/browse/HBASE-1936 . Is this what you meant by copying out the jars

dconnolly · ‎06-15-2016

Hi all, here a few steps to get a quick example of around sqooping some Oracle Data into HDFS and Hive table working using the Oracle Developer VM and Hortonworks Sandbox. Very simple but may provide some help for people just starting out. I will be using VirtualBox for this walkthrough so I assuming you already have this installed. Also the Oracle VM will require about 2 Gig of memory and Sandbox about 8 Gig, so you will need a machine with decent a amount of memory to give this a try. I used a mac with 16 gig and it ran fine. Step1: Download Hortonworks Sandbox and import ova into VirtualBox http://hortonworks.com/products/sandbox/ Step2: Download Oracle Developer VM (this may require setting up an Oracle free account) and import ova into virtualbox. http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html Step3: Set up the 2 VMs so they can communicate with each other. There are many options here, but I setup a Nat Network on the second network adapter within both VM’s for this test. Few diagrams below to help. Basically set up new Nat Network in Virtualbox under Vitualbox Preference menu -> select network icon and add new Nat Network (display below – called it DensNetwork). The go into the settings for both VM’s, go to network, click on 2 nd adapter and follow diagram below. VB natnetwork diagram VB settings – sandbox VB setting – Oracle VM Step4: Fire up the VM’s, open a terminal session, and ssh into the sandbox ssh -p 2222 root@sandbox.hortonworks.com or ssh -p 2222 root@127.0.0.1 - pw-hadoop Step5: You can read up a little about Oracle CDB and PDB, will help with understanding the Jdbc connection a little if needed. The Oracle VM database will have SID of orcl12c, and Pluggable DB of orcl, all passwords will be oracle. http://docs.oracle.com/database/121/CNCPT/cdbovrvw.htm#CNCPT89236 Step6: Sqoop will need the ojdbc6.jar in order to run correctly, I put mine in /usr/share/java/ojdbc6.jar. Step7: Sqoop list table and Sqoop employees into HDFS. list system tables 1.sqoop list-tables --connect jdbc:oracle:thin:system/oracle@10.11.12.5:1521:orcl12c list Pdb tables 2.sqoop list-tables --driver oracle.jdbc.driver.OracleDriver --connect jdbc:oracle:thin:system/oracle@10.11.12.5:1521/orcl --username system --password oracle import employee table (cleanup, sqoop, check out what you sqooped) 1.hadoop fs -rm -R /user/hive/data/employees/ 2.sqoop import --connect jdbc:oracle:thin:system/oracle@10.11.12.5:1521/orcl --username system --password oracle --table HR.EMPLOYEES --target-dir /user/hive/data/employees 3.hadoop fs -ls /user/hive/data/employees 4.hadoop fs -cat /user/hive/data/employees/part-m-00000 Step8: Add and load a hive table using beeline: 1.enter beeline on sandbox terminal window 2.connect - !connect jdbc:hive2://sandbox.hortonworks.com:10000 sandbox pw or !connect jdbc:hive2://127.0.0.1:10000 3.create database HR; 4.USE HR; 5.CREATE TABLE employees (employee_id int, first_name varchar(20), last_name varchar(25), email varchar(25), phone_number varchar(20), hire_date date, job_id varchar(10), salary int, commission_pct int, manager_id int, department_id int) row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile; 6.LOAD DATA INPATH '/user/hive/data/employees/' OVERWRITE INTO TABLE employees; 7.Go ahead and query the table

dconnolly · ‎06-09-2016

Great Demo Valdimir, much appreciated. I am just digging into the banana facet code a little, it looks the filter by query has one patient "cunningham" in the query string. If I figure it out I will post, lol. Ahh, you just need to X out cunningham and then the list appears, got it, after a little solr learning

dconnolly · ‎05-31-2016

Hi Ryan, nice demo, seems some of the confusion when you look through the lineage type questions, is where lineage begins. this is a loaded question, but why would lineage not begin with the initial input of data to a table through say through hive view off of ambari, or beeline script, etc. Curious of your thoughts

Online	Offline
Last Visited	‎07-05-2017 08:18 PM

Member Since	‎03-29-2016 11:09 AM
Last Visited	‎07-05-2017 08:18 PM
Posts	46
Kudos received	25

Cloudera Community

Re: adding another cluster to ambari console

Re: Couldnt find sqoop-site.xml in installation di...

Re: Hive LLAP for production?

Re: How to Sqoop Teradata column with latin charse...

Re: How to Sqoop Teradata column with latin charse...

Re: HDFS Snapshots Based Replication Using Apache ...

Re: Understanding Taxonomy in Apache Atlas

Re: How to get Atlas up and running in HDP 2.5 San...

Re: Falcon UI not working

Self Service Hadoop – well some starting points

Re: How do I add custom HBase co-processors in HDP...

Sqooping Oracle Data simple steps

Re: Visualize patients' complaints to their doctor...

Re: Using Apache Atlas to view Data Lineage