Member since
07-25-2018
174
Posts
29
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5512 | 03-19-2020 03:18 AM | |
3533 | 01-31-2020 01:08 AM | |
1408 | 01-30-2020 05:45 AM | |
2655 | 06-01-2016 12:56 PM | |
3136 | 05-23-2016 08:46 AM |
12-27-2016
11:11 AM
lineage.pngHi All, I have downloaded HDP 2.5 sandbox,we know that this sandbox has latest Apache altas version and it's already configured with HBASE for metadata storage of all entities(such as columns,tables and metadata about lineage). Assume I am already connected to HBASE shell. Hive tables: Input: patient_info_raw output : patient_validated_dataset First of all my first question is : 1) can we add some additional metadata information for any entity in HBASE storage? 2) How to see Atlas metadata information from HBASE? Problem Statement: Consider there are two hive tables named as "patient_info_raw" and "patient_validated_dataset" as show in attached diagram.Out of this tables one table is been already created(i.e. patient_info_raw) and it's metadata is also present in HBASE,so my requirement it that i want to link (or show lineage in atlas UI) this table to "patient_validated_dataset" table just by inserting metadata in HBASE storage.Here, I do not want to execute any hive query(such as CREATE TABLE AS SELECT....,CREATE TABLE <tablename>) on input table(i.e. patient_info_raw). Lineage must be reflected in atlas UI just by inserting lineage metadata in HBASE tables to create link between these two tables. We have two tables in HBASE 1) ATLAS_ENTITY_AUDIT_EVENTS 2) atlas_titan can we do above task in Apache Atlas if yes, then what are the steps to complete it? Please keep in mind that i am not going to create the output table by executing hive query,Atlas should show metadata information,lineage of output table in atlas UI just by metadata insertion?
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache HBase
11-25-2016
12:39 PM
Hi Everyone, I have read so many blogs and document over internet regarding Apache atlas and Apache falcon and have done some POC also using these tools.but here,I don't understand what is the actual difference between these tool? As per my understanding both the tools are committing to provide data management life cycle and data governance featuresalso.so I am little bit confused here and feeling that both are providing similar features. I don't understand which tool I should use in my use case for data governance as both are giving lineage?. Here i am confused that where these above tool will fit in my use case(general questionj)?. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Falcon
11-16-2016
05:17 PM
Thank you sagar, Did you mean i need pull audit data from ranger audit database. If i am correct then ,how to do this activity? Or do we have another approach to pull audit data?
... View more
11-16-2016
12:52 PM
1 Kudo
Hi Everyone, I want to fetch audit data of apache ranger by using REST API.specially,wants to fetch data of access tab. so does apache ranger support such REST API call? Please find attachment.I have shown which data I really want to get.apache-ranger-audit-tab.png Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ranger
10-17-2016
09:39 AM
Hi Guys, I am new to the machine learning course I have dataset of clinical trials.It contains some textual as well as numerical data both(I have converted all the textual data/features into numeric by using Divectorization library of python). I have attached dataset csv file as well as jupyter python notebook.Please check it. if you want dataset description,then please visit below link and have used same public data from clinicaltrial.gov website. https://clinicaltrials.gov/ct2/about-studies/glossary Problem Statement:A dataset contains "ENROLLMENT" column(which shows number of participants required for clinical study) so,i want my algorithm should predict "ENROLLMENT" based on train data. Please change the format from .txt to .csv for ct_gov_results and .txt to .ipynb for temporary_notebook file before you opens. Issue: I am getting RMSE value as somewhat near to 3000 which is not good value.As per my knowledge it's value must be in between the range of 0 and 1. I don't understand how to reduce it's value so that my algorithm will works fine for my data. Please do response,Your reply will be very valuable for me. Thanks in advance.
... View more
09-21-2016
03:06 AM
Thanks Rajkumar, Do we need some extra configuration to have spark job server? If yes,then what are those steps? Can we pass command line option to curl call while submitting spark job?
... View more
09-21-2016
02:52 AM
Hi guys, I have written spark job which read csv file and on top of csv data am calling dataframe api to extract the profile information. I am manually submiting spark job by using spark-submit command but I want to write jersey rest api as wrapper on top of spark job and which is i have done but am not getting How I should launch my spark job from rest service method? I am giving call to rest api using tomcat,so is it possible to launch spark job in tomcat web application? If not then, Is there any solution to achieve this functionality? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Spark
09-09-2016
08:21 AM
1 Kudo
Hi , I want to profile data which is present in sQL type databases(like hive/MySQL). Does anyone know which tool is suitable for this? It that tool is of hortonworks community then that will be best for me. I have already tried pyxplorer python library but that didn't work for me because some installation problem. please sent me tool link,Every where I am seeing Ta lend for data profiling but does talend provide all features related to data quality and data profiling(features like counting number of table,number of columns,distinct column values,min/max column value etc).
... View more
Labels:
- Labels:
-
Apache Hive
09-07-2016
04:51 AM
Thanks Vadim, This works for me. but suppose if i want to clear all metadata including tad metadata,hive relaed metadata etc. so Is it possible in atlas? I dont want to re-install atlas but wants to only just clear metadata.I have configured "berkeley database" for storing the metadata information.Do you know how to access this graph based database? and can we delete metadata by accessing this database? How to take access of it? If you know then could you please send me steps/additional software required to access graph database? Thank you in advance.
... View more
09-01-2016
03:26 PM
Hi vadim You are saying to delete input and output entites but how to delete those using rest api? Is there any rest api available for that?
... View more