Member since
05-05-2016
18
Posts
16
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1542 | 07-02-2016 03:31 PM | |
13709 | 07-02-2016 01:07 PM |
09-21-2018
06:00 AM
When NiFi flow files has batch of json records separated by newline the EvaluateJsonPath emits out only the first record in flow files dropping all other following records. Splitting the files into records is very inefficient. Is there any way to parse json records in batch?
... View more
Labels:
10-26-2017
06:34 AM
The HDP versions are not listed while setting up the cluster using Ambari 2.5.2.0 on Ubuntu 16.04. All prerequisites have been done on the machine.
... View more
07-29-2016
07:04 AM
1 Kudo
http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/
... View more
07-18-2016
01:11 PM
You can refer https://community.hortonworks.com/questions/9790/orgapachehadoopipcstandbyexception.html for this issue.
... View more
07-18-2016
01:08 PM
The binaries downloaded by Ambari will install HDP for sure but you need to delete them and place your custom binaries before running the cluster setup. Thanks, Puneet
... View more
07-18-2016
08:09 AM
Hi Praveen, Here are a few points to help: 1. Try running your API without options like "--driver-memory 15g --num-executors 25 --total-executor-cores 60 --executor-memory 15g --driver-cores 2" and check logs for memory allocated to RDDs/DataFrames. 2. Driver doesn't need 15g memory if you are not collecting data on driver. Try setting it to 4g rather. I hope u r not using .collect() or similar operations which collect all data to driver. 3. The error needs fine tuning your configurations between executor memory and driver memory. The total number of executors(25) are pretty much higher considering the memory allocated(15g). Reduce number of executors and consider allocating less memory(4g to start with). Thanks, Puneet
... View more
07-18-2016
05:12 AM
Gaurav, While installing the Hadoop stack through Ambari it fetches install packages from hortonworks repo, even if you install apache Ambari distribution. Check Repo URLs at: https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+2.2.2+from+Public+Repositories However, if you want to experiment with installation below are a few things that could help you: 1. Ambari downloads the binaries to be installed at "/var/lib/ambari-server/resources/stacks/HDP/2.4/services/" directory. 2. You can copy the required services binaries at "/var/lib/ambari-server/resources/stacks/HDP/2.4/services/" on Ambari server machine and try running the installation. NOTE: the approach is not verified but worth a try. Thanks, Puneet
... View more
07-18-2016
04:57 AM
Could you share more details like command used to execute and input size?
... View more
07-05-2016
08:41 AM
1 Kudo
Could you please share the snippets which reproduces the issue.
... View more
07-05-2016
04:05 AM
1 Kudo
1. For NULL issue u need to map columns in HBase and Hive. See example at below link. https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-ColumnMapping 2. The default value for rowkey is '\002'(Ctrl B) character.
... View more
07-04-2016
05:03 AM
1 Kudo
Yes. A projection before any sort of transformation/action would help in computation time and storage optimization.
... View more
07-02-2016
03:31 PM
2 Kudos
Yes. Reducing size of dataset before JOIN would surely help rather than other way round.
... View more
07-02-2016
03:29 PM
1 Kudo
There are scenarios(though bad) where data insertion requires the ordering of columns to be in Lexicographical Sorting while inserting data into db using JDBC connection. Not sure if jestin ma is facing similar issue.
... View more
07-02-2016
01:07 PM
2 Kudos
@Jestin: Why do you need sorting columns in dataframes? Could u please elaborate. However in Java there is no inbuilt function to reorder the columns.
... View more
07-02-2016
12:48 PM
2 Kudos
Answer 2: You can use HBaseStorageHandler for Hive-HBase Integration. Please refer: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz", "hbase.mapred.output.outputtable" = "xyz");
... View more
05-14-2016
03:17 PM
1 Kudo
Yes, relying on Spark logs is a solution to this but it does take away the freedom to log custom messages. So, what I am expecting is some solution like SparkContext.getLogger().info("message") which will be Lazy evaluated when the action is called at last.
... View more
05-14-2016
03:15 PM
1 Kudo
Thanks for the input. Yes that is a solution but I don't want to call any action as I mentioned. So, what I am expecting is some solution like SparkContext.getLogger().info("message") which will be Lazy evaluated when the action is called at last.
... View more
05-13-2016
01:50 PM
3 Kudos
I am trying to capture the logs for my application before and after the Spark Transformation statement. Being Lazy in evaluation the logs get printed before a transformation is actually evaluated. Is there a way to capture logs without calling any Spark action in log statements, avoiding unnecessary CPU consumption?
... View more
Labels: