Member since
09-14-2015
79
Posts
91
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1041 | 01-25-2017 04:43 PM | |
755 | 11-23-2016 05:56 PM | |
3218 | 11-11-2016 02:44 AM | |
742 | 10-26-2016 01:50 AM | |
5730 | 10-19-2016 10:22 PM |
04-24-2017
11:07 PM
Hi @Todd Wilson, Looks like your filenames and directories are clashing from previous failed jobs. Can you try the following: delete the underlying directory in the /user/cdwadmin directory that was used for temp space. This looks to be: /user/cdwadmin/temp_022543 assuming you are doing a full import from scratch, go into Hive and delete the sql_class.Addresses7 table so it can import from a clean slate. Let us know how it goes
... View more
02-14-2017
05:22 PM
Thanks! I wanted to confirm before took such a drastic move 🙂 Worked perfectly.
... View more
02-14-2017
05:08 PM
I am aware that there is no way to delete Atlas tags via the UI or REST endpoints. However, I am wondering if there is a simple way to truncate the underlying database or wipe it so we start with what is essentially a fresh Atlas installation?
... View more
Labels:
- Labels:
-
Apache Atlas
02-02-2017
08:58 PM
Turns out my initial intuition was correct. I spent more time hands-on with the cluster and found that there were two instances of MySQL running on two different nodes. One was properly backing the Hive metastore and therefore any interaction through beeline or Spark via Zeppelin would see those tables. The other was running locally where the Hive CLI was executed. This caused a split-brain type situation. Since the Hive configuration JDBC URL pointed to localhost they were both connecting to the local MySQL. We corrected the JDBC URL and everything is functioning properly again.
... View more
02-01-2017
11:57 PM
1 Kudo
I have an HDP 2.5 cluster (no Ranger) where I can launch the Hive CLI and create a table, call it X. Then when I connect via beeline table X is not visible. The opposite is also true: if I create a table Y via beeline then the table is not visible via the Hive CLI. Under the covers, the corresponding directory structure exists in HDFS for both (i.e., I see X.db and Y.db under /apps/hive/warehouse). Anyone seen such a thing before? It seems like beeline and Hive CLI are using different metastores.
... View more
- Tags:
- Data Processing
- Hive
Labels:
- Labels:
-
Apache Hive
01-27-2017
01:52 PM
Hi @Prasanna G, Yes, the Sandbox should work with your version of VMware Workstation. Just be sure you download the version labeled for VMware here. After that you should be able to import the virtual image into your instance Workstation and launch the machine.
... View more
01-25-2017
04:43 PM
Hi @Devpriyo Bhattacharya, You cannot install HDP natively on Windows 10. You can, however, run HDP on your Windows 10 laptop via Docker or virtual machines but I expect the behavior to be very unpredictable. You are going to be *very* resource constrained and you will likely experience occasional component failures and/or slow response time. That said, typically I would recommend that you leverage the dockerized sandbox for this situation. However, if you wish to go through the process of a customized installation then you can take the whole thing for a test drive using Docker for Windows. To do this you will need to do the following: Download and install Docker for Windows Launch a CentOS 7 container with appropriate ports opened - most importantly 8080 for Ambari and 10000 for HiveServer2 - you may realize that you need others open later for various UIs and connectivity (e.g., 50070 for the HDFS UI) Connect to the CentOS 7 container and run through the standard Ambari installation process to install your custom single-node HDP installation. I recommend installing the very minimum number of components due to your resource limitation. This will get you a single-node HDP installation running on your laptop that you can use for basic functionality testing. It will be similar to the Sandbox with the exception being that you have hand-selected the components that you wish to install.
... View more
01-25-2017
04:21 PM
Hi @MPH, There is an article describing how to configure Spark to leverage LLAP here.
... View more
11-23-2016
07:23 PM
@Dagmawi Mengistu Happy to help. If you don't need any more detail then feel free to accept the answer so we can close out the issue. Thanks!
... View more
11-23-2016
05:56 PM
3 Kudos
Hi @Dagmawi Mengistu, We do not currently support start/stop of cluster created via HDC. The likely reason that you are seeing the above error is that the local instance storage was chosen to support HDFS at cluster creation. This is short-lived and does not persist through start/stop of instances in EC2. In general, HDC clusters are intended to be for ephemeral workloads. If you want to start and stop the compute resources to control costs then I recommend creating a persistent metastore backed by RDS and providing that when you create new clusters. This way you can spin up and destroy clusters as you see fit and the data can be persisted via the shared metastore and S3. I hope this helps.
... View more
11-11-2016
02:44 AM
2 Kudos
Hi @Saminathan A One thing you can do is drop the SplitLine processor and go straight to the ExtractText processor where you can use a regex to pull out the first 5 lines via a regex. Then you can use the groups within that regex to work on the individual groups (e.g., the first 5 lines) in the UpdateAttribute processor. This regex should work for you: ^(.*)\n(.*)\n(.*)\n(.*)\n(.*)\n.*
... View more
11-02-2016
06:46 PM
There is also a good amount of detailing covering all of the knobs and dials related to configuring the Capacity Scheduler here. Note that in the latest versions of Ambari there is a Capacity Scheduler View where you can graphically configure the queues instead of getting into the weeds of the XML.
... View more
10-26-2016
01:50 AM
1 Kudo
Hi @Houssam Manik, The big benefit that you get by utilizing snapshots with distCP is that you can do incremental backups when distCP'ing the snapshotted directory in the future by leveraging the differential between the snapshots. Jing provides some context around this in the second answer here. The work to complete this is discussed in HDFS-7535 and some more context is provided there. This was first pulled into Hadoop 2.7.0
... View more
10-19-2016
10:22 PM
5 Kudos
Hi @Santhosh B Gowda, Assuming that this is happening on a single JournalNode then you can try the following: As a precaution, stop HDFS. This will shut down all Journalnodes as well. On the node in question, move the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) to an alternate location. Copy the fsimage edits directory (/hadoop/hdfs/journal/stanleyhotel/current) from a functioning JournalNode to this node. Start HDFS. This should get this Journalnode back inline with the others and get you back to a properly functioning HA state.
... View more
10-10-2016
08:05 PM
1 Kudo
Hi @Simran Kaur, You can still achieve this using out-of-the-box functions in Hive as you mentioned. You just missed getting the string in the right format. For clarity, the basic steps are:
Replace the 'T' in the string with a space so the date is in the format expected by Hive. Convert the string to a unix timestamp using the unix_timestamp function. Convert the timestamp to your preferred date format using the from_unixtime function. Here is a quick example you can run in Hive to see the result for the string you provided: select from_unixtime(unix_timestamp(regexp_replace('2016-09-13T06:03:51Z', 'T',' ')), 'dd-MM-yyyy HH-mm-ss'); Notice that the only additional step is the replace operation.
... View more
10-06-2016
09:16 PM
Specifically, I am trying to remove the spark.yarn.principal and spark.yarn.keytab parameters from a configuration group so my Livy server can run successfully. I've been able to manually copy the configs, remove these parameters, and point Livy at the custom configs and everything works. Simply setting them to be empty strings does not work, they need to be fully removed. I would like Ambari to manage this instead of being something I have to keep in-sync if more spark configs change. Is there a way to do this either via the UI or down in the internals of Ambari?
... View more
Labels:
- Labels:
-
Apache Ambari
09-01-2016
03:08 PM
3 Kudos
Hi @Aengus Rooney and @Tamas Bihari, I just ran into the same issue. I tried to deploy with two different unique stack names and both times I got a failure during cbdinit. For giggles I tried again but this time removed the non-alphanumeric characters ('&' and '*') from my password and it succeeded. Seems like there is an implicit character set limit on the password. Would be nice to either remove this limit [preferable] or at least make a note on the form that only alphanumeric characters are acceptable.
... View more
08-30-2016
06:48 PM
Hi @Sami Ahmad, You can install NiFi using the steps documented in the installation guide here.
... View more
08-24-2016
08:40 PM
The title really says it all. Is there a point at which HBase will fall over if I take too many snapshots for an individual table or a point at which I will notice issues restoring/exporting snapshots if I take too many? So, for example, if I take daily snapshots then do I need to start cleaning them up after 30 days, 6 months, a year?
... View more
- Tags:
- Data Processing
- HBase
Labels:
- Labels:
-
Apache HBase
08-24-2016
06:33 PM
1 Kudo
I know we can do an all or nothing approach to locking down the ResourceManager UI but I am looking for a slightly more granular control. I want to allow user X to see the UI and only be able to view their running jobs. Is this possible through existing RM configurations? I have seen a related question but it is actually the opposite of my requirement above.
... View more
Labels:
- Labels:
-
Apache YARN
-
Cloudera Manager
08-23-2016
09:47 PM
2 Kudos
Hi @Johnny Fugers, What you are talking about sounds like you are trying to build a recommender system. I assume what you are trying to do is understand that users that do X and Y also tend to do Z. From your example, customers that buy diapers and beer also tend to buy milk. Within HDP, you would be looking at Apache Spark and the machine learning capabilities that it offers. There is a clear example of leveraging Collaborative Filtering for this type of problem on the Spark site. You should be able to run this directly on your HDP sandbox or cluster.
... View more
08-22-2016
01:15 AM
2 Kudos
Hi @Smart Solutions, I think this would be sufficient to certify that the libraries are installed and your applications will be able to find them. You can find several examples that are ready to run under /usr/hdp/current/spark-client/examples/src/main/python/mllib. You can substitute python with your preferred language to find examples that correspond to the appropriate API. In terms of optimized configurations, it is hard to tune that upfront as it will be highly dependent upon on your application, dataset, and cluster.
... View more
08-19-2016
03:39 PM
I think you should grant this role to the user BEFORE restarting HS2 with the new Authorization enforced. I would revert your Hive configuration back to the original settings. Then execute the GRANT. Then re-enable the SQL authorization via the configs and service restart.
... View more
08-19-2016
03:37 PM
In the link I posted abouve there is a step requiring the admin user to grant admin privileges to itself: GRANT admin TO USER adf_admin;
... View more
08-19-2016
02:25 PM
Hi @John Smith, At first glance, I am wondering if you granted the appropriate admin privilege to the adf_admin user. There are a few more details on setting up SQL standard auth in a SysAdmin guider here. Notice the step where the admin user must grant themselves admin rights.
... View more
08-18-2016
02:40 PM
Like I said, I do not believe you can connect from Linux to SQLServer using IWA with the sqlsever jdbc driver. I recommend that you drop the jTDS driver in your Sqoop lib dir and try using that driver.
... View more
08-18-2016
12:09 PM
Based on the error, it seems that you do not have a valid Kerberos ticket. Is the machine you are initiating the Sqoop job from integrated with your Windows AD via Kerberos?
... View more
08-17-2016
08:40 PM
1 Kudo
Hi @Andy Max, This should certainly be doable and should be relatively straight forward. In the end, I would recommend you stand up a small sandbox environment that mimics the current one and test out this process and develop a concrete playbook. The rough steps that I would recommend you try are:
Stop all services on the existing cluster. Use Apache Ambari to install a "dummy" HDP 2.4.x cluster on the current cluster: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_Installing_HDP_AMB/content/ch_Getting_Ready.html
Install barebones with only HDFS, ZK, etc. services. Make sure that the HDFS namenode and datanode directories are dummy directories that do not point to your existing data and namenode directories. Stop all services via Ambari. In Ambari, change the data and namenode directories for HDFS to point to your old directories. Start the services back up and verify that the data is available. This should work smoothly with HDP 2.4 because 2.4 also include Apache Hadoop 2.7.1 so the file system version is identical. This all assumes that you are only using HDFS. It could all get a bit hairier if you have Hive tables sitting on top with some metadata that needs to be migrated. Cheers, Brandon
... View more
08-17-2016
08:19 PM
1 Kudo
Hi @Khera What JDBC driver are you using to connect to SQL Server? The one provided by MS does not support WIndows authentication. That said, you can grab another driver that does support it. You have a couple options:
Both Simba and Data Direct have drivers to support this authentication method. These have free trials but are ultimately going to require a license for repeated use.
There is also jTDS which is free and open source and claims to support Windows Authentication so we can take it for a spin if you would like. You can see the rough JDBC URL that you would need is: <code>jdbc:jtds:sqlserver://123.123.123;instance=server1;databaseName=students;integratedSecurity=true;authenticationScheme=JavaKerberos
... View more
08-09-2016
03:26 PM
1 Kudo
Looks like you are missing the rpm-python package. If yum is working you can try re-installing rpm-python with yum. Alternatively, depending on the version of CentOS or RHEL that you are using, you can find the appropriate RPM in the OS archives. For example, CentOS 6 RPMs are here. Ctrl-f and search for rpm-python to find the package.
... View more