Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
15068 | 09-01-2018 01:27 AM | |
1888 | 09-01-2018 01:18 AM | |
5609 | 08-20-2018 09:39 PM | |
952 | 07-20-2018 04:51 PM | |
2490 | 07-16-2018 09:41 PM |
02-02-2018
09:45 PM
1 Kudo
I believe you meant Spark "Thrift Server" @kgautam https://community.hortonworks.com/articles/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server The alternative would be to use Apache Livy http://livy.apache.org/docs/latest/programmatic-api.html
... View more
02-02-2018
04:07 PM
@Carlton Patterson , you don't need to understand beeline, if you take that command that was given to you, it is mostly copy-paste. The URL is even on the Ambari dashboard for you to copy exactly beeline -u <URL> --outputformat=<FORMAT> -f <YOUR_SCRIPT> > <DESTINATION FILE ON LOCAL DISK> The only confusing part there if you are unfamiliar with shell commands is Output Redirection to a file. The rest is very similar to any terminal based execution of a SQL script. As far as I know, this is the only free way (as in, money) to get the data out to a file, in full. The alternative solution is to download a trial version of RazorSQL or pay for a tool like Tableau, that can export the SQL results. Depending on your data size, Excel or LibreOffice might work
... View more
02-02-2018
03:56 PM
Use "-getmerge" to combine all files into one.
... View more
01-30-2018
09:31 PM
"Add Service" in Ambari creates multiple Zeppelin Servers. You would need an external load balancer like HAProxy, Nginx, etc to get a single URL to switch between all instances. Cluster work loads are typically running in YARN, and should be distributed on their own, with or without Zeppelin.
... View more
01-26-2018
11:05 PM
Did you mean version 0.7? https://cwiki.apache.org/confluence/display/RANGER/Support+for+%24username+variable
... View more
01-20-2018
04:04 AM
Ambari is only okay if the agents are healthy and responding. You will at least need something like Nagios to check when services are down, disks are dead or full, fan stopped worked, RAM is bad, etc. Personally, I'm a big fan of Ansible for running distributed SSH commands across the entire cluster. Ansible uses Jinja2 templates just like Ambari for templating out config files, it can start/stop services, sync files across machines, etc. Much better than ssh-ing to each host one by one. With the recent release of Ansible Tower, you can make a centralized location for all your Ansible scripts. Alternative tools such as Puppet/Chef exist, and many older infrastructures already have those tools in place elsewhere in their infrastructure. If you have RHEL, then Satellite might be worth using. For tracing problems, you absolutely need some log collection framework and enabling JMX on every single Java / Hadoop process. You can pay for Splunk, or you can roll your own setup using Solr or Elasticsearch. Ambari recently added Ambari Infra and Log Search, which are backed by Solr. Lucidworks has a project named Banana that adds a nice dashboarding UI on top of Solr, although Grafana is also nice for dashboarding. If you go with Elasticsearch, it offers Logstash and Beats products that integrate well with many other external systems.
... View more
01-17-2018
03:52 AM
1 Kudo
Hi Micheal. I trust your ability to make your own PowerPoint with the following information. Most importantly, Ambari has nothing to do with Kafka. I strongly suggest you explain Kafka on its own, without ever mentioning Ambari. Moving on, at a high level-view, there is the Ambari Server (the web UI you login to), and agents (the hosts that you can add services to, manage, and monitor). Ambari has no concept of workers. Ambari Server requires a running relational database of PostgreSQL, MySQL, or Oracle. Perhaps you should start here, but I will try to continue. https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Design Ambari uses widgets to display the dashboards and graphs. Services running on external systems are configured by SSH communication to the Ambari agents. Ambari allows you to have a central location to define configuration files for any environment. Hadoop is not required for Ambari to work; while it is commonly used for it, Ambari is fully extendable via what are called "stacks." The HDP stack includes Hadoop, Hive, HBase, Pig, Spark, Ranger, etc. When you first login to a fresh Ambari server, you have a default login account, and you must define a cluster and add hosts before you can do anything useful with Ambari. It is preferred to use Ambari to setup and manage services itself on new hosts rather than attempting to add existing hosts with pre-installed services to Ambari. For example, you should not attempt to install Hadoop with Puppet/Chef/Ansible, and then add this server to Ambari. You should use those tools to manage the Ambari Agent installation, then continue on with a typical Ambari "Add Host" operation. The agents communicate with the Ambari Server periodically sending heartbeats to let it know that it is alive, and able to accept requests. Ambari offers different account access restrictions via its login methods. For example, if you want administrators to change and restart services, as well as read-only users to view overall cluster usage, or access the HDFS file system browser, you can selectively allow these actions. Ambari also has "Ambari Views," which allow you to extend and expose your own type of "web portal" to any system running in your environment. Hope this gets you stared, but the Ambari wiki page is a fine resource for more information
... View more
01-17-2018
03:19 AM
@Tu Nguyen I suggest you post a new question, rather than hijack this one. Your error does not relate directly to transactional tables, but rather the OrcSplits generated by your table. How about if you should try to use spark.read.format("orc") from the filesystem? org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 111 more
Caused by: java.lang.NumberFormatException: For input string: "0248155_0000"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
... View more
01-16-2018
02:47 AM
@Guillaume Roger And what will the end user do with those zipped CSV files once they get them? Load them into Excel? Surely, you can expose some SQL interface or BI tool to allow the datasets to be queried and explored as they were meant to be within the hadoop ecosystem.
... View more