Member since
01-11-2018
5
Posts
0
Kudos Received
0
Solutions
11-08-2019
03:23 AM
I've followed the steps to configure Azure Datalake Gen2 into HDP and hdfs command runs properly. However, when I run Hive to create an external table on the datalake, I the typical error: ...Permission denied: user=hive...
Instead of hive user, Hive should be using the configured client id to interact with the datalake. In fact, if a folder has 777 permission, it is possible to create the table and the owner is the client-id. However, if I try to remove the table, I get again the error permission denied: user=hive...
It seems that the process is:
1. hive user ask for rights in the folder. If folder is 777, hive has rights to write.
2. If access, write with client-id user into datalake
How shall I configure HIVE and datalake gen2 work propertly and avoid giving 777 rights to all the files? ACLs is the best aproach, but I can't add hive user to datalake's folder, it must be some clien-id from azure.
Thank you!!
... View more
09-16-2019
03:36 AM
You are trying to connect to the datalake using root user. Does root have a group in the machine? You can check hdfs UGI with the following command: hadoop org.apache.hadoop.security.UserGroupInformation Check this for more info:
... View more
11-10-2018
09:47 PM
As you can see, the error message sais to check the file /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist This files stores the last mount point for each hdfs folder. In your case, seems that you are trying to mount the HDFS folders on different paths so datanode doesn't start to prevent data loss. Fix the file to point to the new mount points and start the datanode.
... View more
02-06-2018
06:42 AM
Hello @AntonPuz. Did you find the reason of this issue? I have a similar situation. I could achieve different insertion rate changing the number of rows sent per request: session.setMutationBufferSpace( batchSize ); However, I'm not able to find the bottleneck of the system.
... View more
01-12-2018
12:04 AM
Hello,
I'm trying to run a oozie example from command line but I'm getting the error: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [2]
When I try to see the error logs in HUE, the error is Error getting logs at node6.agatha-cluster:8041
With the next codes, does anyone know what is happening? How can I get more accurate error logs?
Thank you!
job.properties (in local mode)
nameNode=hdfs://node10:8020
jobTracker=node10:8032
queueName=default
examplesRoot=oozie_examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark
master=yarn-client
workflow.xml (stored in hdfs ${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark)
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkPythonPi'>
<start to='spark-node' />
<action name='spark-node'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Python-Spark-Pi</name>
<jar>pi.py</jar>
<spark-opts>--executor-memory 20G --num-executors 5</spark-opts>
<arg>value=10</arg>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error message [${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
pi.py (stored in hdfs ${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark)
import sys
from random import random
from operator import add
from pyspark import SparkContext
if __name__ == "__main__":
"""
Usage: pi [partitions]
"""
sc = SparkContext(appName="Python-Spark-Pi")
partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
n = 100000 * partitions
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 < 1 else 0
count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
sc.stop()
... View more