Member since
07-31-2019
346
Posts
259
Kudos Received
62
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2870 | 08-22-2018 06:02 PM | |
1662 | 03-26-2018 11:48 AM | |
4099 | 03-15-2018 01:25 PM | |
5057 | 03-01-2018 08:13 PM | |
1415 | 02-20-2018 01:05 PM |
01-17-2017
06:03 PM
2 Kudos
Hi @Joe Harvy You can create scripts that will create the databases and tables based off the Hive metastore. This blog walks you through the step. https://sharebigdata.wordpress.com/2016/06/12/hive-metastore-internal-tables/
... View more
01-03-2017
04:43 PM
@Sami Ahmad You'll want to use beeline going forward since the Hive CLI will be deprecated. Beeline's JDBC connection provides for a higher level of security than the Hive CLI.
... View more
12-20-2016
02:19 PM
@Isa Mllr See this version matrix https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning
... View more
11-29-2016
05:16 PM
1 Kudo
Hi @Dagmawi Mengistu. Make sure you setup Hive proxy as described here http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-views/content/setup_HDFS_proxy_user.html
... View more
11-21-2016
06:29 PM
1 Kudo
Hi @Adnan Alvee. Having already narrowed performance issues down to a particular job, the primary thing you will want to look for within each stage is skew. Skew is when a small number of tasks take significantly longer to execute than others. You'll want to look specifically at task runtime to see if something is wrong. As you drill down into slower running tasks you'll want to focus on where they are slow. Is the slowness in writing data, reading data, or computation? This can narrow things down to a particular problem with a node or maybe you don't have enough disks to handle the scratch space for shuffles. Keep in mind that Spark scales linearly. Your processing may be slow simply due to not enough hardware. You'll want to focus on how much memory and cpu you've allocated to your executers as well has how many disks you have in each node. It also looks as if your #executers is quite large. Consider having fewer executors with more resources per executor. Also the executor memory is memory per executor. The number you have is a bit large. Try playing with those numbers and see if it makes a difference. Hope this helps.
... View more
11-14-2016
04:45 PM
Hi @Clay McDonald. One of the reason is because of MapReduce. Hive uses Tez but Polybase is not compatible yet with Tez. MapReduce is a batch data processing engine. You will also want to make sure your Hive tables are properly configured using best practices. Try implementing some of these rules where applicable http://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/. Also be aware of your cluster size. MapReduce (as well as other data processing engines) use parallel processing but if you don't have many nodes than you are taking advantage of the design. Note sure if its applicable in your case but you could use multiple SQL Servers to parallelize your Polybase query. https://msdn.microsoft.com/en-us/library/mt607030.aspx
... View more
11-09-2016
02:12 PM
Thanks @Andrew Grande! That worked! I feel like a noob 🙂 but appreciate all the help!
... View more
11-08-2016
09:12 PM
Here is the repeating error in nifi-app.log.nifi-app.txt
... View more
11-08-2016
09:02 PM
I apologize. The actual error is "File name is too long". The name of the file is COWFILE1.DAT and COWFILE1.ARC.
... View more
11-08-2016
08:21 PM
I have a couple of rather large hex files I need to convert to another format with the intention of then stripping out certain attributes and storing results in SQL Server and/or Hive. The files are 12.7 MB and 3.3 MB. When I use the code from this HCC answer https://community.hortonworks.com/questions/60597/hexdump-nifi-processor-nifi-hexdump-processor.html import java.io.DataInputStream
def flowFile = session.get()
if(!flowFile) return
def attr = ''
session.read(flowFile, {inputStream ->
dis = new DataInputStream(inputStream)
attr = Long.toHexString(dis.readLong())
attr2 = Long.toHexString(dis.readLong())
} as InputStreamCallback)
flowFile = session.putAttribute(flowFile, 'first16hex', attr+attr2)
session.transfer(flowFile, REL_SUCCESS)
in an ExecuteProcess I get a "File too large" error. I'm also aware of this JIRA but am looking for a good workaround. https://issues.apache.org/jira/browse/NIFI-2997
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)