About bsaini

bsaini · ‎12-11-2015

I see 'Connection Refused' which means either a service is down or connection to wrong port. Like Deepesh said, appears to be former and that History server is down.

bsaini · ‎12-11-2015

@Matthew bird You need a home directory for the user in HDFS so here is what is needed - #Login as root to the sandbox su - hdfs hdfs dfs -mkdir /user/root hdfs dfs -chown root:hadoop /user/root hdfs dfs -chmod 755 /user/root Try to run the pig script after you've done the above steps.

bsaini · ‎12-11-2015

@Amit Jain Atlas has ton of exciting features in the roadmap and definitely plans for two way metadata exchange with other metadata management tools. As of right now (& this may change), the plan is to exchange the lineage information with other tools too, to be able to provide an end-to-end lineage of data from source system, all the way to the final destination. With that said, it seems very unlikely that in a large enterprise setting you would replace all other metadata tools with one magical tool. Typically speaking the governance tools are expected to tap into the data processes automatically and non-intrusively to gather lineage information and this require native hooks into those data processes. Atlas has and will continue to expand, when it comes to native hooks for processing that takes place in a Hadoop cluster but I doubt there is any interest in tapping natively into the processes going on into other systems like data warehousing system, transactional, operational and reporting systems. For those pieces (metadata and lineage) from external systems, Atlas will continue to rely on and integrate with other metadata tools. Just like Hadoop, other components in overall data architecture have their roles and place so they will continue to exist and so will the governance tools for those components. Vendors need to and (most likely) will work together to provide a seamless experience to the customers. If you havent watched this presentation from Andrew Ahn, PM for Governance Tools at HWX, I would highly recommend it to understand better where Atlas is going - https://www.youtube.com/watch?time_continue=3&v=LZ... Hope this helps. Let me know if you have any follow up question.

bsaini · ‎12-11-2015

There are few solutions - 1. The easy solution - grant permission on files to root user. In this case, looks like the file has wide open permission but because the file is under another user's home directory, may be root user does not have access to the guest home directory. So, check the permission for /user/guest and adjust if needed. 2. Use the correct user for the job - I like to create a service Id for data processing and not use local super users like (root) or hdfs super users like (hdfs). So you can use users like guest and inbuilt test user ambari-qa. The user is identify based on their local OS identity so you can switch user to guest before running the process.

bsaini · ‎12-10-2015

@Hajime - The best way to find the nodemanager heap size and other memory settings is to calculate it specifically for your cluster size and hardware spec. Here is the utility that you can use - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-... Snippet hdp-configuration-utils.sh options where options are as follows: Table 1.1. hdp-configuration-utils.sh Options Option Description -c CORESThe number of cores on each host.-m MEMORYThe amount of memory on each host in GB.-d DISKSThe number of disks on each host.-k HBASE"True" if HBase is installed, "False" if not. The output recommendation is in this format - Using cores=16 memory=64GB disks=4 hbase=True Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4 Num Container=8 Container Ram=6144MB Used Ram=48GB Unused Ram=16GB yarn.scheduler.minimum-allocation-mb=6144 yarn.scheduler.maximum-allocation-mb=49152 yarn.nodemanager.resource.memory-mb=49152 mapreduce.map.memory.mb=6144 mapreduce.map.java.opts=-Xmx4096m mapreduce.reduce.memory.mb=6144 mapreduce.reduce.java.opts=-Xmx4096m yarn.app.mapreduce.am.resource.mb=6144 yarn.app.mapreduce.am.command-opts=-Xmx4096m mapreduce.task.io.sort.mb=1792 tez.am.resource.memory.mb=6144 tez.am.launch.cmd-opts =-Xmx4096m hive.tez.container.size=6144 hive.tez.java.opts=-Xmx4096m hive.auto.convert.join.noconditionaltask.size=1342177000

bsaini · ‎12-04-2015

@Neeraj Sabharwal Its not the same error. The exception stack trace pasted by OP is originating with Atlas (org.apache.atlas.web.filters.AuditFilter.doFilter) where as the one in the JIRA is within Hadoop. Same exception class different applications.

bsaini · ‎12-04-2015

Looking at the ExecuteSQL code here. The capability description reads - @CapabilityDescription("Execute provided SQL select query. Query result will be converted to Avro format." + " Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on " + "a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. " + "If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the " + "select query. " + "FlowFile attribute 'executesql.row.count' indicates how many rows were selected." ) Even though above para says - "Streaming is used so arbitrarily large result sets are supported." , it appears that its not referring to the JDBC streaming but the fact that ResultSet is broken down into smaller tuples and sent to next processor as stream. Here is the snippet of Code to back that assessment - Query Execution in ExecuteSQL and call to JDBCCommon. convertToAvroStream -> convertToAvroStream method reading data using getObject method The getObject method does not seem to support streaming alternative like getAscii etc as described here - https://docs.oracle.com/cd/B28359_01/java.111/b312...

bsaini · ‎12-04-2015

Can you help understand the scenario when this is needed? So the Hive shell is executed but wait until a query is executed for creating AM.. this means there are situations where Hive shell is executed and then exited without executing the query? Wont this be an exception scenario or in your case this is so frequent / regular that a workaround is required. I am sorry, just trying to understand when will such a configuration be needed..

bsaini · ‎12-04-2015

This should be updated / corrected then? Partitioning Recommendations for Slave Nodes Hadoop Slave node partitions: Hadoop should have its own partitions for Hadoop files and logs. Drives should be partitioned using ext3, ext4, or XFS, in that order of preference. HDFS on ext3 has been publicly tested on the Yahoo cluster, which makes it the safest choice for the underlying file system. The ext4 file system may have potential data loss issues with default options because of the "delayed writes" feature. XFS reportedly also has some data loss issues upon power failure. Do not use LVM; it adds latency and causes a bottleneck. Source: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_cluster-planning-guide/content/ch_partitioning_chapter.html A lot of this conflicts with the reality (Paul's Smartsense statistics) and what we all are discussing here.

bsaini · ‎12-04-2015

My response to your comment was longer than whats allowed for comments so adding as new answer.

Online	Offline
Last Visited	‎04-06-2018 07:42 PM

Member Since	‎09-24-2015 03:23 PM
Last Visited	‎04-06-2018 07:42 PM
Posts	178
Kudos received	103

Cloudera Community

Re: Which is better to create Hadoop accounts in L...

Re: Last step of Ambari HDP installation fails for...

Re: How to create falcon entity dependencies?

Re: Where is the output of an Oozie workflow store...

Re: Hi I am new to falcon , can anyone help me wit...

Re: Failed to read data from "/user/guest/Batting....

Re: Failed to read data from "/user/guest/Batting....

Re: Atlas as an enterprise metadata tool?

Re: Failed to read data from "/user/guest/Batting....

Re: NodeManager memory setting best practice?

Re: Atlas generating huge log files

Re: Nifi executeSQL processor streaming capability

Re: Is there a way to prevent the start of the Tez...

Re: Ext4 vs XFS Filesystem - Survey of Popularity

Re: Can I have an oozie coordinator that runs once...