About bwilson

bwilson · ‎05-03-2016

Hi @Bharat Rathi, I am not sure what version of Spark you are using but this sounds a lot like SPARK-10309 (a known issue in Spark 1.5). Notice that this is specifically related to Tungsten. You can try disabling Tungsten as sugested by Jit Ken Tan in the JIRA by the following: sqlContext.setConf("spark.sql.tungsten.enabled", "false")

bwilson · ‎04-28-2016

Hi @Mon key, Spark 1.6.0 is available and deployable via Ambari if you are running HDP 2.4.0. Otherwise, you have the option to deploy Spark 1.6 (Technical Preview) manually on HDP 2.3.x as discussed here.

bwilson · ‎04-27-2016

Ambari is not currently able to manage multiple clusters. That being said, you can manage different hardware profiles within a single cluster using Ambari and config groups: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_Ambari_Users_Guide/content/_using_host_config_groups.html

bwilson · ‎04-23-2016

After completing this tutorial you will understand how to: leverage Spark to infer a schema on a CSV dataset and persist it to Hive without explicitly declaring the DDL deploy the Spark Thrift Server on the Hortonworks Sandbox connect and ODBC tool (Tableau) to the Spark Thrift Server via the Hive ODBC driver, leveraging caching for ad-hoc visualization Assumption 1: It is assumed that you have downloaded and deployed the Hortonworks sandbox, installed the Hive ODBC driver on your host machine, and installed Tableau (or your preferred ODBC-based reporting tool). Assumption 2: Please ensure that your host machine's /etc/hosts file has the appropriate entry mapping sandbox.hortonworks.com to the IP of your sandbox (e.g., 172.16.35.171 sandbox.hortonworks.com sandbox). Deploying the Spark Thrift Server Within Ambari, click on the Hosts tab and then select the sandbox.hortonworks.com node from the list. Now you can click “Add” and choose Spark Thrift Server from the list to deploy a thrift server. After installing, start the thrift server via the service menu. Loading the Data The code blocks below are each intended to be executed in their own Zeppelin notebook cells. Each cell begins with a '%' indicating the interpreter to be used. Open Zeppelin and create a new notebook: http://sandbox.hortonworks.com:9995 Download and take a peek at the first few lines of the data: %sh wget https://dl.dropboxusercontent.com/u/3136860/Crime_Data.csv hdfs dfs -put Crime_Data.csv /tmp head Crime_Data.csv Load the CSV reader dependency: %dep z.load("com.databricks:spark-csv_2.10:1.4.0") Read the CSV file and infer the schema: %pyspark sqlContext = HiveContext(sc) data = sqlContext.read.load("/tmp/Crime_Data.csv", format="com.databricks.spark.csv", header="true", inferSchema="true") data.printSchema() Persist the data to Hive: %pyspark data.registerAsTable("staging") sqlContext.sql("CREATE TABLE crimes STORED AS ORC AS SELECT * FROM staging") Verify the data is present and able to be queried: %sql select Description, count(*) cnt from crimes group by Description order by cnt desc Connecting Tableau via ODBC Connect using the Hortonworks Hadoop Hive connector: Run the “Initial SQL” to cache the crimes table: Verify the table is cached in the Thrift Server UI: http://sandbox.hortonworks.com:4040/storage/ Select the default schema and drag the crimes table into the tables area Go to the worksheet and start exploring the data using the cached table!

bwilson · ‎03-04-2016

Adding to Artem's comment, please make sure that node1 and node2 can ping one another. This looks like either node2 does not know how to resolve node1's IP address or else you do not have network access between the nodes. You should be able to place entries in /etc/hosts on both nodes to correct this and also ensure that iptables is turned off on all nodes.

bwilson · ‎03-03-2016

Hi @Mark Thorson, I recommend that you start from docs.hortonworks.com and from their navigate to the docs of the version of HDP on which you are installing Ranger. So, in the link you sent you would end up here for version 2.2.4.2 but for the latest version 2.4.0 you would end up here. To get there just click on HDP at docs.hortonworks.com and then select the version of HDP that you are running. From the next page you can click on "Non-Ambari Cluster Installation Guide" to get to the manual steps to install Ranger for your version of HDP. It is very important that you are following the steps and therefore using the appropriate repos for your version of HDP. Hope this helps, Brandon

bwilson · ‎01-07-2016

Hi @rbalam There is a MIB for Ambari as of Ambari 2.2. See here.

bwilson · ‎12-16-2015

Hi @Aidan Condron, One option worth considering is Apache Phoenix (https://phoenix.apache.org/). Phoenix using relational constructs to make working with data in HBase simpler. With HDP we have a simple example of loading CSV data into HBase and querying using Pheonix. Check it our here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP...

bwilson · ‎12-10-2015

Hi @Gangadhar Kadam, You've got eveything almost right. When you build the jar, you need to move into the build directory and then trigger the jar -cvzf command to avoid having the "build part of the directory hierachy put into the JAR. So, the following should work: javac -cp `hadoop classpath`MaxTemperatureWithCompression.java -d /Users/gangadharkadam/hadoopdata/build cd /Users/gangadharkadam/hadoopdata/build jar -cvf MaxTemperatureWithCompression.jar . hadoop jar MaxTemperatureWithCompression.jar org.myorg.MaxTemperatureWithCompression user/ncdc/input /user/ncdc/output Try it out and compare the results of jar -tf MaxTemperatureWithCompression.jar. You should see: [root@sandbox build]# jar -tf MaxTemperatureWithCompression.jar META-INF/ META-INF/MANIFEST.MF org/ org/myorg/ org/myorg/MaxTemperatureWithCompression.class org/myorg/MaxTemperatureWithCompression$Map.class org/myorg/MaxTemperatureWithCompression$Reduce.class Whereas currently your steps result in: [root@sandbox test]# jar -tf MaxTemperatureWithCompression.jar META-INF/ META-INF/MANIFEST.MF build/org/ build/org/myorg/ build/org/myorg/MaxTemperatureWithCompression.class build/org/myorg/MaxTemperatureWithCompression$Map.class build/org/myorg/MaxTemperatureWithCompression$Reduce.class This works for me on my HDP 2.3 Sandbox.

bwilson · ‎12-10-2015

Hi Mike, NiFi comes as part of Hortonworks Data Flow. You can grab the bits and install it from this location: http://hortonworks.com/hdp/downloads/#hdf There are installation and configuration instructions available there as well. Also, if you want to take NiFi for a quick spin in the sandbox then Ali has a great demo here: https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html

Online	Offline
Last Visited	‎02-15-2022 08:21 PM

Member Since	‎09-14-2015 08:02 PM
Last Visited	‎02-15-2022 08:21 PM
Posts	79
Kudos received	88

Cloudera Community

Re: Install HDP 2.4.2 on my Dual Core, 8GB RAM Win...

Re: HDCloud datanodes unmount after restart of ser...

Re: How to extract first 5 record from flow file u...

Re: Export/Import HDFS snapshots

Re: journal node edit log issue

Re: Spark Sql error : Unable to acquire 1048576 by...

Re: how to integrate spark1.6 with ambari2.2.1 .P...

Re: Multiple clusters on Ambari

Querying Data via SparkSQL with ODBC Tools

Re: HDP 2.3 Sandbox adding nodes fails

Re: Manual install for Ranger Repo Missing Error 4...

Re: Ambari MIB for alerts via SNMP

Re: How do I import data from csv file into Hbase?

Re: Exception in thread "main" java.lang.ClassNotF...

Re: does anyone know why nifi not included in hdp ...