About JordanMoore

JordanMoore · ‎02-02-2018

I believe you meant Spark "Thrift Server" @kgautam https://community.hortonworks.com/articles/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server The alternative would be to use Apache Livy http://livy.apache.org/docs/latest/programmatic-api.html

JordanMoore · ‎02-02-2018

@Carlton Patterson , you don't need to understand beeline, if you take that command that was given to you, it is mostly copy-paste. The URL is even on the Ambari dashboard for you to copy exactly beeline -u <URL> --outputformat=<FORMAT> -f <YOUR_SCRIPT> > <DESTINATION FILE ON LOCAL DISK> The only confusing part there if you are unfamiliar with shell commands is Output Redirection to a file. The rest is very similar to any terminal based execution of a SQL script. As far as I know, this is the only free way (as in, money) to get the data out to a file, in full. The alternative solution is to download a trial version of RazorSQL or pay for a tool like Tableau, that can export the SQL results. Depending on your data size, Excel or LibreOffice might work

JordanMoore · ‎02-02-2018

Use "-getmerge" to combine all files into one.

JordanMoore · ‎01-30-2018

"Add Service" in Ambari creates multiple Zeppelin Servers. You would need an external load balancer like HAProxy, Nginx, etc to get a single URL to switch between all instances. Cluster work loads are typically running in YARN, and should be distributed on their own, with or without Zeppelin.

JordanMoore · ‎01-26-2018

Did you mean version 0.7? https://cwiki.apache.org/confluence/display/RANGER/Support+for+%24username+variable

JordanMoore · ‎01-23-2018

@Bipin Pradhan, please post your question as a brand new post.

JordanMoore · ‎01-20-2018

Ambari is only okay if the agents are healthy and responding. You will at least need something like Nagios to check when services are down, disks are dead or full, fan stopped worked, RAM is bad, etc. Personally, I'm a big fan of Ansible for running distributed SSH commands across the entire cluster. Ansible uses Jinja2 templates just like Ambari for templating out config files, it can start/stop services, sync files across machines, etc. Much better than ssh-ing to each host one by one. With the recent release of Ansible Tower, you can make a centralized location for all your Ansible scripts. Alternative tools such as Puppet/Chef exist, and many older infrastructures already have those tools in place elsewhere in their infrastructure. If you have RHEL, then Satellite might be worth using. For tracing problems, you absolutely need some log collection framework and enabling JMX on every single Java / Hadoop process. You can pay for Splunk, or you can roll your own setup using Solr or Elasticsearch. Ambari recently added Ambari Infra and Log Search, which are backed by Solr. Lucidworks has a project named Banana that adds a nice dashboarding UI on top of Solr, although Grafana is also nice for dashboarding. If you go with Elasticsearch, it offers Logstash and Beats products that integrate well with many other external systems.

JordanMoore · ‎01-17-2018

Hi Micheal. I trust your ability to make your own PowerPoint with the following information. Most importantly, Ambari has nothing to do with Kafka. I strongly suggest you explain Kafka on its own, without ever mentioning Ambari. Moving on, at a high level-view, there is the Ambari Server (the web UI you login to), and agents (the hosts that you can add services to, manage, and monitor). Ambari has no concept of workers. Ambari Server requires a running relational database of PostgreSQL, MySQL, or Oracle. Perhaps you should start here, but I will try to continue. https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Design Ambari uses widgets to display the dashboards and graphs. Services running on external systems are configured by SSH communication to the Ambari agents. Ambari allows you to have a central location to define configuration files for any environment. Hadoop is not required for Ambari to work; while it is commonly used for it, Ambari is fully extendable via what are called "stacks." The HDP stack includes Hadoop, Hive, HBase, Pig, Spark, Ranger, etc. When you first login to a fresh Ambari server, you have a default login account, and you must define a cluster and add hosts before you can do anything useful with Ambari. It is preferred to use Ambari to setup and manage services itself on new hosts rather than attempting to add existing hosts with pre-installed services to Ambari. For example, you should not attempt to install Hadoop with Puppet/Chef/Ansible, and then add this server to Ambari. You should use those tools to manage the Ambari Agent installation, then continue on with a typical Ambari "Add Host" operation. The agents communicate with the Ambari Server periodically sending heartbeats to let it know that it is alive, and able to accept requests. Ambari offers different account access restrictions via its login methods. For example, if you want administrators to change and restart services, as well as read-only users to view overall cluster usage, or access the HDFS file system browser, you can selectively allow these actions. Ambari also has "Ambari Views," which allow you to extend and expose your own type of "web portal" to any system running in your environment. Hope this gets you stared, but the Ambari wiki page is a fine resource for more information

JordanMoore · ‎01-17-2018

@Tu Nguyen I suggest you post a new question, rather than hijack this one. Your error does not relate directly to transactional tables, but rather the OrcSplits generated by your table. How about if you should try to use spark.read.format("orc") from the filesystem? org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 111 more Caused by: java.lang.NumberFormatException: For input string: "0248155_0000" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

JordanMoore · ‎01-16-2018

@Guillaume Roger And what will the end user do with those zipped CSV files once they get them? Load them into Excel? Surely, you can expose some SQL interface or BI tool to allow the datasets to be queried and explored as they were meant to be within the hadoop ecosystem.

Online	Offline
Last Visited	‎12-07-2015 12:15 PM

Member Since	‎11-19-2015 11:49 AM
Last Visited	‎12-07-2015 12:15 PM
Posts	158
Kudos received	25

Cloudera Community

Re: what is the most best monitoring tool for hado...

Re: What are the resources and technologies requir...

Re: How can I run kafka connect to import data fro...

Re: HDP Component working in deep

Re: I want to add an additional edge node to my ex...

Re: Can I run the spark for the web application?

Re: How to store Query Results to Local Drive

Re: How to store Query Results to Local Drive

Re: zeppeline load balancing

Re: Ranger User Variables use for HDFS policies

Re: Spark with HIVE JDBC connection

Re: ambari cluster & hadoop 2.6 - what are the th...

Re: need PowerPoint doc to explain ambari nodes as...

Re: Spark with HIVE JDBC connection

Re: Concatenate and zip files in hdfs