Member since
06-05-2019
117
Posts
127
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
361 | 12-17-2016 08:30 PM | |
272 | 08-08-2016 07:20 PM | |
619 | 08-08-2016 03:13 PM | |
480 | 08-04-2016 02:49 PM | |
549 | 08-03-2016 06:29 PM |
08-08-2016
07:20 PM
Hi @João Souza Personally, I'd create a script by each individual table. This way I can focus on the one table (if something changes) rather than modifying a larger script that encompasses all the tables (which would of course be more coding - creating a steeper learning curve for another developer).
... View more
08-08-2016
05:04 PM
Hi @habeeb siddique Take a look at https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_performance_tuning/content/ch_setting_memory_usage_for_hive_perf.html What is your YARN container size? It is recommend that the memory be set to 80% - if you are using HDP 2.3+, make sure TezConfiguration.TEZ_CONTAINER_MAX_JAVA_HEAP_FRACTION property is set to .8
... View more
08-08-2016
04:53 PM
@Deepak k Are you referring to HDP (Hortonworks Data Platform) as a custom stack?
... View more
08-08-2016
03:19 PM
Hi @sankar rao I would recommend using Ambari 2.2.2 (the latest version) and not Ambari 1.6.1 (from 2014).
... View more
08-08-2016
03:13 PM
1 Kudo
Hi @Mayank Pandey If you have existing tables (not in ORC format), I'd recommend creating the ORC tables. Then run: insert into yourorctable
select * from yourexistingtable; Is this how you are currently inserting data?
... View more
08-08-2016
03:10 PM
Hi @iwan petrow Do you have a NiFi cluster setup - or are you running on one machine? What are the CPU (# of cores) and Memory of the machine (or machines?)
... View more
08-04-2016
02:49 PM
2 Kudos
Hi @Joshua Adeleke Can you access http://server:8088/proxy/application_1469778114081_4192 ? There should be some helpful logs that tell you exactly what happened (from the MapReduce side).
... View more
08-04-2016
02:35 PM
Hi @Mukesh Kumar This may not be what you're looking for, but I'd recommend downloading the HDP 2.5 Sandbox - where Zeppelin is GA.
... View more
08-03-2016
06:29 PM
1 Kudo
Hi @john doe (by the way love the name) Take a look at this https://community.hortonworks.com/questions/44374/nifi-putkafka-and-verifying-published-messages.html -> it looks like the same issue.
... View more
08-03-2016
06:09 PM
Hi @Calvin Hung I understand your dilemma (I asked about the use case for Hue - to see if you could use the functionality in Ambari instead). There is a great visual tool for Tez (Tez View in Ambari) but since you are using MapReduce, this won't help. Sure - lets install Hue. Following these instructions, you'll see that Hue is not support with CentOS 7, Ubuntu or Debian. It is supported for
CentOS 6 Oracle Linux 6 Red Hat Enterprise Linux (RHEL) 6 SUSE Linux Enterprise Server (SLES) 11, SP3/SP4 I'm hoping you have one of these Linux OS' installed. Let me know if you have any trouble installing.
... View more
08-03-2016
01:47 PM
Hi @Calvin Hung What are you using Hue for?
... View more
08-03-2016
01:30 PM
Hi @Saurabh Rathi Create the table in Phoenix first (all lower-case), then run psql.py - problem solved.
... View more
08-02-2016
10:40 PM
1 Kudo
Hi @kavitha velaga https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2 This is the new standard to impersonate (doAs) hive.server2.enable.doAs – Impersonate the connected user, default true. The most recent documentation doesn't seem to discuss "hive.server2.enable.impersonation", therefore, I'd stick with doAs
... View more
08-02-2016
09:50 PM
Hi @Jay Ch Great question. You must register your df1 as a temporary table like: val table = sqlContext.read.format("orc").load("/apps/hive/warehouse/yourtable")
table.registerTempTable("yourtable") and then run: val tester = sqlContext.sql("select * from yourtable");
tester.columns
You'll get the actual column names
... View more
08-02-2016
07:28 PM
Hi @Kyle Dunn Can you open Ambari, click on the Admin tab and Versions? Are you installing HDP on the same machines as Pivotal? Are they pointing to a separate instance of HDFS? How much data are you dealing with? I'd suggest using distcp to copy over the necessary data.
... View more
08-02-2016
05:25 PM
Hi @Shiv kumar Can you run the following in Python on each host (this verifies the FQDN matches up) import socket socket.getfqdn() Do you in fact get centos.test.com, centos1.test.com, centos2.test.com, centos3.test.com? Also, do you have passwordless ssh enabled from centos.test.com (I'm guessing where Ambari-server is running?) to the other hosts centos1.test.com, centos2.test.com, centos3.test.com?
... View more
08-02-2016
04:01 PM
@mike harding -> are you running Zeppelin (within HDP 2.5 Sandbox) or running Zeppelin standalone?
... View more
08-02-2016
03:32 PM
Hi @mike harding What version of Zeppelin are you using?
... View more
08-02-2016
03:26 PM
Hi @Yaron Idan The region servers should contain the data. If you are dynamically provisioning more servers (for compute) executing Spark jobs there will be a hit on the network (since the data isn't local) - but as long as your SLA is being met, it should be fine.
... View more
08-02-2016
03:09 PM
Hi @Eric Brosch Have you tried Apache NiFi? It may be easier to use NiFi to setup the architecture you are describing - and you won't have any complicated code (to prevent message duplication). You'll use NiFi to read the log files and ingest into HDFS (no need for Flume).
... View more
08-02-2016
02:23 PM
Hi @Saurabh Rathi Can you try doing create table employee (id integer primary key, name varchar) - without the double quotes? and then do: bin/psql.py -t employee localhost data.csv
... View more
07-26-2016
04:11 PM
2 Kudos
Hi @Yaron Idan Take a look at this HCC Article - HDP on AWS (Best Practices) If you need HBase as suggested by Ancil, you should take a look at an i2 instance
... View more
07-26-2016
03:57 PM
1 Kudo
Hi @Saurabh Rathi The problem is your table name "user" - this is a reserved word. Reserved words are generally a bad naming convention for both tables and columns. If you must absolutely need to use the word "user" for a table name, then I found a way to make this work: 1) Create the table as capital "USER" - this is important because psql.py needs the capital letters to work with a reserved word 2) Then call psql with USER: bin/psql.py -t USER localhost data.csv Let me know if that works for you (I tested it and it should work).
... View more
07-15-2016
11:28 PM
8 Kudos
Teradata's JDBC connector contains two jar files (tdgssconfig.jar and terajdbc4.jar) that must both be contained within the classpath. NiFi Database processors like ExecuteSQL or PutSQL use a connection pool such as DBCPConnectionPool which defines your JDBC connection to a database like Teradata. Follow the steps below to integrate Teradata JDBC connector into your DBCPConnectionPool: 1) Download the Teradata connectors (tdgssconfig.jar and terajdbc4.jar) - you can download the Teradata v1.4.1 connector on http://hortonworks.com/downloads/ 2) Extract the jar files (tdgssconfig.jar and terajdbc4.jar) from hdp-connector-for-teradata-1.4.1.2.3.2.0-2950-distro.tar.gz and move these files to your NIFI_DIRECTORY/lib/* 3) Restart NiFi 4) Under your DBCPConnectionPool (Controller > Controller Services), Edit your existing DBCPConnectionPool (if your pool is active, disable it before editing) 5) Under the Configuration Controller Service > Properties, define the following Database Connection URL: your Teradata jdbc connection url Database Driver Class Name: com.teradata.jdbc.TeraDriver Database Driver Jar Url: * Do not define anything, since you added the two jars to the NiFi classpath (nifi/lib), the driver jars will be automatically picked up -> you could only add one Jar here and you need two *which is why we added to the nifi/lib directory Database User: Provide Database user Password: Provide password for Database user You're all set, you'll now be able to connect to Teradata from NiFi!
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- Database
- How-ToTutorial
- NiFi
- nifi-processor
- teradata
Labels:
07-14-2016
05:33 PM
Hi @Venkata Sridhar Gangavarapu What version of HDP did you upgrade from?
... View more
07-14-2016
05:02 PM
Hi @Kashif Khan Is the table gdwi_test storing orc format? Are there any indexes on the table?
... View more
07-14-2016
04:14 PM
1 Kudo
Hi @Angel Kafazov If you upgraded HDP and Ambari to their latest versions, what are you worried about on updating your Ubuntu packages (since HDP and Ambari are already at their latest versions)? Referring to https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_Installing_HDP_AMB/content/_download_the_ambari_repo_ubuntu14.html if you want a version of Ambari later than 2.2.2, wouldn't you need to add a new repo? wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.2.0/ambari.list -O /etc/apt/sources.list.d/ambari.list (later than 2.2.2)?
... View more
07-14-2016
04:00 PM
The default scheduler is org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler - why did you change this to Fair Scheduler?
... View more
07-14-2016
03:54 PM
Hi @Anilkumar Panda It looks like you aren't providing a password "Since Password is NOT provided, Trying to use UnSecure client with username and password" - are you connecting to Hive via beeline?
... View more
07-14-2016
03:52 PM
Hi @Ketan Dikshit In HDP - Yarn uses the Capacity Scheduler, and within each Queue, you can specify a Order = Fair Scheduling. Is this what you mean by you are using the fair scheduler? In Ambari, what do you have Yarn > Configs > Advanced > Scheduler > yarn.resourcemanager.scheduler.class set to? I would recommend using Pre-emption, do you have this enabled?
... View more