About SQLShaw

SQLShaw · ‎02-23-2016

@nasghar though you can export MS Access to a csv and import that into Hive I would suggest instead importing the data into SQL Server and use Sqoop.

SQLShaw · ‎02-22-2016

It's making more sense. My yarn.nodemanager.resource.memory-mb was only set to 16 GB so this restricted my min and max settings. Still not clear what to set disks to in a virtual environment in order to get a good baseline setting.

SQLShaw · ‎02-22-2016

We are running an 8 node virtualized cluster with 5 datanodes. Each datanode is allocated 8 vcores, 54 GB of RAM, and use shared SAN storage. The output of yarn-utils (v=8, m=54, d=4) is: yarn.scheduler.minimum-allocation-mb=6656 yarn.scheduler.maximum-allocation-mb=53248 yarn.nodemanager.resource.memory-mb=53248 mapreduce.map.memory.mb=6656 mapreduce.map.java.opts=-Xmx5324m mapreduce.reduce.memory.mb=6656 mapreduce.reduce.java.opts=-Xmx5324m yarn.app.mapreduce.am.resource.mb=6656 yarn.app.mapreduce.am.command-opts=-Xmx5324m mapreduce.task.io.sort.mb=2662 Some questions I have is 1) what do you put for disks value when data node disks are running on shared SAN storage? and; 2) The maximum container size only shows 8 GB even though each node is assigned 54 GB. Does this have something to do with over commitment in the virtual environment? yarn-utils wants it set to 53 GB.

SQLShaw · ‎02-18-2016

@Anshul Sisodia you may want to begin transitioning from Hue to Ambari Views. There is a File Browser view you can use to upload files.

SQLShaw · ‎02-17-2016

@Jeremy Salazar since the error states that the user is "ambari" you will need to add the following values to the HDFS custom core-site configuration: hadoop.proxyuser.ambari.groups=* hadoop.proxyuser.ambari.hosts=* Once that's done you'll need to follow @Neeraj Sabharwal step and create your home directory and assign access.

SQLShaw · ‎02-09-2016

@teru mat make sure you create a SQL server user in SQL Server which matches the user specified in the HDP install. This will need to be a native SQL user. You can't use Windows authentication since HDP on Windows does not currently support Kerberos.

SQLShaw · ‎02-09-2016

@Sunile Manjee The Apache HDFS documentation states To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader. I would guess that increasing the number of replicas increases the chances that the replica will reside close to the reader. Probably simplistic, but a logical guess.

SQLShaw · ‎02-08-2016

@Sunile Manjee though I have no personal experience with them there are companies like BlueData who abstract the storage component and provide a interesting private cloud experience based on containers. An interesting read on this subject is a book by Google called Datacenter as a Computer.

SQLShaw · ‎02-06-2016

@Malek Ben Salem was this resolved? If so, please either accept an answer or post the solution. Thanks!

SQLShaw · ‎02-05-2016

If you are using the sandbox on virtualbox, you'll need to add the Nifi port for port forwarding. 2016-02-05-11-52-14.png

Online	Offline
Last Visited	‎06-25-2024 10:10 AM

Member Since	‎07-31-2019 06:56 AM
Last Visited	‎06-25-2024 10:10 AM
Posts	346
Kudos received	257

Cloudera Community

Re: Regarding to activate HIVE ACID transactions o...

Re: Hive 1.2.1++

Re: What is the fastest way to load data into Apac...

Re: Do i have to commit my insert statment in hive...

Re: Deploying hortonworks sandbox VM to cluster

Re: Best way to migrate MS Access Databases to Had...

Re: How does virtualization affect "python yarn-ut...

How does virtualization affect "python yarn-utils....

Re: When uploading file in HDFS through HUE, it cr...

Re: User: ambari is not allowed to impersonate USE...

Re: How to set up SQL Server Settings?

Re: HDFS replication and impact on concurrency

Re: SAN vs DAS(JBOD) on data node

Re: NIFI Gettwitter HTTP_ERROR: HTTP/1.1 401 autho...

Re: NIFI Gettwitter HTTP_ERROR: HTTP/1.1 401 autho...