Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4067 | 08-20-2018 08:26 PM | |
| 1963 | 08-15-2018 01:59 PM | |
| 2390 | 08-13-2018 02:20 PM | |
| 4138 | 07-23-2018 04:37 PM | |
| 5045 | 07-19-2018 12:52 PM |
02-26-2017
04:03 AM
1 Kudo
Here is DSL Api https://github.com/apache/incubator-atlas/blob/master/repository/src/main/scala/org/apache/atlas/query/QueryParser.scala Do not see any regex or pattern matching support
... View more
02-26-2017
03:52 AM
@Reddy The easiest way in my opinion to do this is via NiFi. Ingest your file via nifi, do a split text essentailly creating flow file for each line in file. Load your senstive keywords in the nifi distributed map cache. Do a lookup for each value in the row against DMC (which stores your sensitive key words). If any of the fields match the sensitive key words, you can route on text and do what ever you wish..ie store that record in a hdfs location. You can also instead of storing indivial records (the ones which have sensitive key words) on hdfs, use mergecontent to merge x number of records into a file and then store on hdfs.
... View more
02-26-2017
03:47 AM
1 Kudo
Apache Atlas does not support HBase metadata at the moment. The entity type (HBase) exist and you can use the atlas rest api to publish tables and columns.
... View more
02-26-2017
03:37 AM
2 Kudos
in my previous life I was part of data modeling team. For hive I used Embaracadero data architect for hive physical data modeling. your logical model should not be impacted by the physical implementation. This is what the gods of LDM tell you. And I tend to agree. You should continue to do version control of your LDM just like you do today as again, the LDM has no baring on PDM. Now version control PDM is similar to how you do on RDMBS. Store all your DDLs in sequential order or invest in source control. Again no different then what you do today fro RDBMS. I believe from a modeling perspective the challenge is how to apply modeling principles to hive. using 3NF is not the right approach. Using the kimball approach (data marts) apply in this space. Those that do not want to invest time into thinking through a data model will say to hyper denormalize in hive. That means they generally don't have a clue about what a data model is and the importance of one in a enterprise. With hive LLAP & tool such at AtScale allow us to model in a more "native" format. Important: for nosql data models, it is best to use domain driven design practice. This is the closest one will get to a PDM. nosql does not follow relational theory, and hence modeling such using PDM core values is a nugatory exercise. Now Apache Phoenix which is a sql skin on top of hbase changes that. You can apply some relational theory PDM rules there.
... View more
02-26-2017
03:08 AM
I don't consider this a hierarchy of encryption. more in tune of encryption and authorization on those zones.
... View more
02-26-2017
02:44 AM
1 Kudo
With ranger you encrypt folders and those will access to those folder will be able to view data (decrypt). you can have userb and c folders encrypted and provide access to those folders to user A. then user will have access to those folders and view data (decrypt).
... View more
02-26-2017
01:33 AM
1 Kudo
Take a look at ambari auto start services feature in ambari 2.4.x Here is the documenation https://cwiki.apache.org/confluence/display/AMBARI/Recovery%3A+auto+start+components#Recovery:autostartcomponents-240Onwards
... View more
02-26-2017
01:25 AM
1 Kudo
Ambari provides many out of the box kafka monitoring abilities. a good article to read is here https://community.hortonworks.com/articles/36725/kafka-monitoring-per-topic-and-per-broker.html Also Kafka using page cache (os cache) and it is important to monitoring system metrics. good place to do this is ambari grafrana dashboards. More info here https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.18/bk_ambari-user-guide/content/_using_grafana.html troubleshooting issues is a practice you will learn overtime by using kafka. there is no real if you get this error do this debugging. All logs are now available via ambari log search service. this is where you starting your reviewing of any errors you receive from kafka. Another source to troubleshoot issues is this forum (HCC).
... View more
02-25-2017
10:38 PM
2 Kudos
Go to this link https://hortonworks.com/downloads/#sandbox Then under Hortonworks Sandbox in the Cloud Underneath that you will see Hortonworks Sandbox Archive Expand that and you will see archive for HDP sandboxes
... View more
02-25-2017
12:34 PM
1 Kudo
closing the loop on this question. You can start and stop any service managed by ambari via command line (rest) as well. info here https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=41812517
... View more