Member since
12-21-2017
67
Posts
3
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1067 | 10-15-2018 10:01 AM | |
4040 | 03-26-2018 08:23 AM |
10-10-2018
05:57 AM
@Aditya Sirna So in default , there are up to 1000 lines of results stored on hdfs for each query? If I increase the limit, will it have some negative effects? Such as slow http transferring? Or result receiving failed?
... View more
10-10-2018
04:49 AM
@Aditya Sirna Thanks Aditya So what about paging? Since the whole results are saved on hdfs in JSON format, if I need to load part of whole result, just load the whole json file and cut out part of it by given page size and page number in memory ? In practice for zeppelin, will it have out of memory problem if the size is too huge?
... View more
10-10-2018
03:52 AM
I am working on designing a hdfs query system based on spark, which containing a paging function, and zeppelin seems be a good sample for me. Now I have a problem. I see spark or spark sql query results are existed even I refresh or reopen the notebook. So the results must be saved on some place. So I am wondering where these result data is saved on? If the data is saved on database, what if the result data size is pretty huge so that causing the database performance problem?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
08-16-2018
07:15 AM
Hi @Jonathan Sneep Fine. thanks. I have added the user and group info to my namenode. So the typical way to adding the new user or group is creating the user and group on namenode, and waiting for usersync to sync the user info to Ranger? So if I don't care the group policy, creating internal user in ranger and specifying them in allow conditions also works? At least it seems work in practice..
... View more
08-16-2018
01:31 AM
Hi @Jonathan Sneep Not yet. So I need to add the user the related group on my namenode host manually?
... View more
08-15-2018
09:39 AM
Hi, @Jonathan Sneep Thanks for your response. Actually both user and group are created in Ranger, which are internal for ranger
... View more
08-15-2018
07:55 AM
I meet some problem in ranger authentication. Here is my step to represent it: 1. I create one account in ranger, where the username is test01 2. I set it belong to a group test_group01 3. In the ranger hdfs policy, I set test_group01 has access to the directory /data/ If it runs normally, the test01 user should have the access to /data/ from the privilege inheriting from the group "test_group01". But in practice, it cannot access the directory /data. However if I specify the 'select user' with the test01, it works well. So it seems that specifying the group in policy doesn't work, and specifying the permitted user is fine. How to solve it? Thanks!
... View more
Labels:
- Labels:
-
Apache Ranger
07-12-2018
08:13 AM
Thanks Jay. I checked curl and libcurl version by running "yum list | grep curl", their version is . curl.x86_64 7.19.7-46.el6 libcurl.x86_64 7.19.7-46.el6 python-pycurl.x86_64 7.19.0-8.el6 libcurl.i686 7.19.7-46.el6 libcurl-devel.i686 7.19.7-46.el6 libcurl-devel.x86_64 7.19.7-46.el6 curl -V prints the following info: curl 7.19.7(x86-64-redhat-linux-gnu) libcurl/7.19.7... Protocals:... Features: GSS-Negotiate ... If I run the alert_spark2_livy_port.py script independently, it runs well What confuse me is, all my three hosts have the complete same version of curl, but only one have the above problem.
... View more
07-12-2018
02:45 AM
The spark livy alert always reports an alert : Connection failed on host ***:8999 In detail, it prints ExecutionFailed: Execution of 'curl -s o /dev/null -w'%{http_code} --negotiate -u: -k http://host:8999/session | grep 200' return 1, curl: option --negotiate: the installed liburl version doesn't support this curl: try curl --help... I have 3 host in this cluster, but only one host report this alert. I have checked the curl version and libcurl on hosts respectively, and they are all same. It may caused by installing anaconda and python version changing, but I am not sure as default python version is 2.6. How to fix it? Thanks!
... View more
Labels:
- Labels:
-
Apache Spark
04-03-2018
03:32 AM
I am trying to read data from kafka and writing them in parquet format via Spark Streaming.
The problem is, the data from kafka are in variable data structure. For example, app one has columns A,B,C, app two has columns B,C,D. So the data frame I read from kafka has all columns ABCD. When I decide to write the dataframe to parquet file partitioned with app name,
the parquet file of app one also contains columns D, where the columns D is empty and it contains no data actually. So how to filter the empty columns when I writing dataframe to parquet?
Thanks!
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark