About cstanca

cstanca · ‎06-21-2016

@Pirlouis Pirlouis Yes. That is the same for RHEL5.x. For the other question, I am not aware of special issues.

cstanca · ‎06-21-2016

@Pirlouis Pirlouis If you are on a newer version of HDP 2.3.x & 2.4.x, we *really* suggest using 2.1.2+ and upgrade to RHEL 6.x. The last ODBC driver Hortonworks provided for CentOS 5.x was 2.0.0. Nothing after is supported on RHEL 5.x after 2.0.0. This is the repo you need: http://public-repo-1.hortonworks.com/HDP/hive-odbc/2.0.0-1000/centos5/hive-odbc-native-2.0.0.1000-centos5.tar.gz or you can go to http://hortonworks.com/downloads/#data-platform , then expand the Archive section and scroll down HDP 2.2 Add-Ons-->Hortonworks ODBC Driver for Apache Hive (v2.0)-->CentOS 5.x If this is what you wanted, please vote the response and accepted it as a best answer.

cstanca · ‎06-16-2016

@atul gupta For Kafka log miner use case and not only, see: https://github.com/linkedin/databus. Databus understands Oracle redo logs. For GoldenGate that is an out of box capability! Not clear why would you use customized open source technologies to push data to GoldenGate. Maybe you want to replace GoldenGate all together. Your business case needs more clarification. If you like this response, please vote it.

cstanca · ‎06-16-2016

@ammu ch This is a loaded question. It depends on the version of Spark you installed, the version of Ambari, the version of HDP, whether you want to use the power of YARN, keep track of configuration changes (keep in mind that Ambari provides configuration management, integration with dashboards etc). It can get complex. Hortonworks does not support this approach. You must a have a serious reason to not use "Add Service" from Ambari to install your new Spark cluster and willing to deal with all these complexities. You should save your Spark configurations and replicate them within Ambari. If you still want to explore the options, please be more specific in versions of the above, plus use of YARN, plus topology of the Spark cluster you installed (multi-node?)

cstanca · ‎06-13-2016

Objective Deploy a 4-node HDP 2.4.2 cluster with Apache Ambari 2.2.2, Vagrant and VirtualBox on OS X host. This is helpful for development and proof of concepts. Scope This approach has been tested on OS X host, but it should work on all supported Vagrant and VirtualBox environments. Pre-requisites Minimum 9 GB of RAM for the HDP 2.4.2 cluster Download and install Vagrant for your host OS: https://www.vagrantup.com/downloads.html Download and install VirtualBox for your host OS: https://www.virtualbox.org/wiki/Downloads Download and install git client for your host Open a command shell and change to the folder where you plan to clone the github repository Clone the following git repository git clone https://github.com/cstanca1/hdp2_4_2-vagrant.git Create and Start VMs Change directory to /hdp_2.4.2-vagrant, the folder that includes Vagrantfile and create a /data folder: mkdir data This /data folder will be needed for guest VMs to share with the host. Vagrant (via Vagrantfile) is configured to use Centos 6.7 as the base box and includes the pre-requisites for installing HDP. 4 VMs will be created: 1 Ambari Server (ambari1), 1 Hadoop master (master1) and 2 slaves (slave1, slave2). vagrant up ambari1 Install and Setup Ambari Server Set a Local Reference to a Remote Ambari Repo vagrant ssh ambari1 sudo su - cd /etc/yum.repos.d wget http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo Setup SSH Access and Starting the Other 3 VMs Add at the same path with files downloaded from the repoosity, your id_rsa and id_rsa.pub keys (seehttps://wiki.centos.org/HowTos/Network/SecuringSSH section 7 for instructions on CentOS). You could perform these steps on ambari1 VM and copy these two files to your /vagrant_data folder which shares data between guest and host. Only after you copy those two files, start the other three VMs: vagrant up master1 vagrant up slave1 vagrant up slave2 Install Ambari Server yum install ambari-server Setup Ambari Server Run the setup command to configure your Ambari Server, Database, JDK, LDAP, and other options: ambari-server setup Start Ambari Server ambari-server start Deploy Cluster using Ambari Web UI Open up a web browser and go to: http://ambari1:8080 Log in with username admin and password admin and follow on-screen instructions, using hosts created and selecting services of interest. For more details, see "Automated Install" at: https://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.2.2.0/index.html

cstanca · ‎06-11-2016

@Armando Segnini Thank you so much for your review. Your findings were spot-on. I had a few typos and omitted a mv command. Excellent catches.

cstanca · ‎06-11-2016

@Artem Ervits, @Deepesh, @Mike Riggs I'm trying this in HDP 2.4 sandbox with SQL Server Express 2014. Connectivity is ok. Even you add it to /var/lib/sqoop/lib/ or /usr/lib/sqoop/lib folder, how do you get past that a connection-manager needs to be set to use a factory class in order to use the Microsoft driver? The error (even is shown as a WARN) is: WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. Due to this, the command does not work. This error is thrown after executing a command like this: sqoop list-databases --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --connect jdbc:sqlserver://10.226.170.191\poc:1433 --username WHATEVERUSER --password WHATEVERPASSWORD --connection-manager directive seems to be needed. What needs to be added?

cstanca · ‎06-09-2016

@Micheal Kubbo What HDP sandbox version do you use? I am intrigued on whether your sandbox supports HTTP 1.0 or 1.1. True, 1.0 is very old and it is unlikely the case, but you should still check. I run the following command on HDP 2.4 sandbox: curl --head 127.0.0.1 and the result is: HTTP/1.1 200 OK Date: Thu, 09 June 2016 03:26:25 GMT Server: gunicorn/19.1.1 ...

cstanca · ‎06-09-2016

@sameer lail It is not stupid what you did. CSV is a file format, not a data structure in R. What you could is to create a dataframe with a single column with all values separated by comma then use hdfs write to output that as a file with extension csv. Another option is to write map-reduce with R and streaming API and set the output to be csv. If any of my responses were helpful, please don't forget to vote them.

cstanca · ‎06-06-2016

@sameer lail What data format is the file that you assign to modelfile dataframe? If it is not csv then you would need to convert it to csv before writing it to HDFS. If it is csv then check this Q/A: https://community.hortonworks.com/questions/36583/how-to-save-data-in-hdfs-using-r.html

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: Hortonworks ODBC driver for Hive on RHEL 5.x

Re: Hortonworks ODBC driver for Hive on RHEL 5.x

Re: How to ingest oracle redo logs into the Kafka....

Re: Spark integration into Ambari

Setup Hortonworks Data Platform using Vagrant, Vir...

Re: Monitoring Kafka with Burrow - Part 1

Re: Sqoop connector for Microsoft SQL Server

Re: How to get rid of this Error Response: HTTP/1...

Re: Write CSV in HDFS

Re: Write CSV in HDFS