Member since
04-17-2016
75
Posts
9
Kudos Received
0
Solutions
05-15-2019
02:13 PM
Hi, I need to get output like total-count-of-records-InTable, count(max(date)) using Hive. Where count(max(date)) ----> gives the count of records loaded from last run. Each record has current-run-date column. I know to get both separately, but not sure to get the output together... Could anyone help me on this... Appreciate your help.. Thanks, Jee
... View more
Labels:
- Labels:
-
Apache Hive
08-18-2018
02:09 PM
Hi there, I have got a project where I need to decide the storage system and then schema design for the below problem scenario. The below is the data which we receive as CSV file for every 1 hour with 1TB of data. recipe_id,recipe_name, description, ingredient, active,
updated_date, created_date 1, pasta, Italian pasta,
tomato sauce, true, 2018-01-09 10:00:57,
2018-01-10 13:00:57 1, pasta, null, cheese,
true, 2018-01-09 10:10:57, 2018-01-10
13:00:57 2, lasagna, layered lasagna,
cheese, true, 2018-01-09 10:00:57,
2018-01-10 13:00:57 2, lasagna, layered lasagna,
blue cheese, false, 2018-01-09 10:00:57,
2018-01-10 13:00:57 …. Requirement: 1. XXX need to show a page with list of all the recipes and when user clicks on each
of the recipe they want to show the Recipe page with their ingredients. They
also want user to further click into each of the ingredient and see all the
recipes linked to that ingredients.
Create
a data model which can store this data to allow user to do the activities
mentioned above. This data model needs to support millions of read per second. Which persistence system will be the best for this scenario?
Write
a Spark Job in Scala which can takes the CSV shown above and store that in the
storage system of your choice using the data model you discussed above. I know Hive will not support this real time requirement, because it is for batch processing only and there is no random access concept. I think I can go with Hbase but I am not sure about the data model. If i know the data model, I can write an application on Spark using Scala to export the data to Hbase as per the data model. Could anyone please help with the storage system and the data model? Appreciate your help! Thanks, Krishna
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Spark
08-03-2017
03:33 PM
Hi There, SBT gives error like "The java installation you have is not up to date sbt requires at least version 1.6+, you have version 0" , even though I have JDK 1.8 installed. If I give "java -version" in windows command prompt, it displays java 1.8. How to make SBT to point this JDK? Thanks, Jeeva
... View more
04-04-2017
06:01 PM
Hi there, I am trying to execute hdfs groups <username> command on commandline, but it gives error like below. I could execute the same command on DEV region to see the list of groups that the user belongs to, but QA gives error. FYI, we are using ISILON(Hadoop) cluster in QA region and HDP 2.5.3. Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): Unknown protocol: org.apache.hadoop.tools.GetUserMappingsProtocol
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy8.getGroupsForUser(Unknown Source)
at org.apache.hadoop.tools.protocolPB.GetUserMappingsProtocolClientSideTranslatorPB.getGroupsForUser(GetUserMappingsProtocolClientSideTranslatorPB.java:57)
at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:96)
Thanks, jee
... View more
Labels:
- Labels:
-
Apache Hadoop
03-23-2017
07:31 PM
Hi Mahesh, There is no way to list all members of supergroup. But, we can run for individual user to see if they are belong to the group by using the below command. hdfs groups <username> Once again, thanks for time .
... View more
03-23-2017
06:43 PM
Anyway.. Thanks for your involvement on this issue. Once i resolved it, I will let you know. I used putty to login but got exception.
... View more
03-23-2017
06:14 PM
Is there any way to find it from ambari without logging to NameNode? If it is not, how to log on to the namenode from linux(edge node)? sorry for the trivial questiion. I tried to login from putty, i got exception.
... View more
03-23-2017
05:41 PM
yes. But, how to list all the users in the superusergroup? I found superusergroup : hdfs in hdfs-site.xml. But, how to find the users in the group hdfs.
... View more
03-23-2017
05:37 PM
Could you suggest me how to check on the namenode?
... View more
03-23-2017
05:23 PM
Hi Maheshwari, Our superusergroup is hdfs That's what i did. He asked me where i got these members from and i told from /etc/group file. But, Technical Lead said something different. Here is the conversation we had below. Jeeva:I got it from /etc/group
file Fred: that is correct for this
particular server how about what namenode? that's where 'hdfs
groups' tell you
... View more
03-23-2017
05:01 PM
Hi Namit Maheshwari, Thanks for your time to reply. Your answer saved my life and one more question How to see the members of the superusergroup for a namenode? In my case, the super user is "hdfs". I have been asked to find all the members of the superuser group. /etc/group file is not for the namenode. So, there should be some other way.
... View more
03-23-2017
03:56 PM
Hi there, How to find the name of supergroup in hadoop and list all the members of the group? I know hdfs is the super user in hadoop, but SuperGroup name
... View more
Labels:
- Labels:
-
Apache Hadoop
03-23-2017
03:01 PM
Hi SBandaru, Thanks for your time to reply. The ACL has been disabled. getfacl: The ACL operation has been rejected. Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false. Is there a way to what are the members in Admin group in Active dircectory? For instance, what should be given for groupName in search filter like dc=xxxxxx and group=what?
... View more
03-23-2017
02:09 PM
Hi there, I have created a directory /tmp/ran-test with access permission(400) in HDFS.
I tried to copy a file to the directory and got permission denied error.
But, my colleague could able to copy files to the directory without any error.
I am wondering now how it works.
I would appreciate if somebody give their explanation. For your information, We are using Active Directory.
Thanks,
Jee
... View more
Labels:
- Labels:
-
Apache Hadoop
03-22-2017
04:32 PM
Hi everyone, The logs are already displayed in the audits tab. The reason why i could not see is because of the"EventTime". The EventTime and Updated Time are not synced. That's why it is not showing the correct screen(i.e., latest logs). Then,i filtered logs based on source type and i could see my logs. However, I am getting different issue. Thank you guys for your time.
... View more
03-22-2017
02:44 PM
Now, it behaves differently. I updated the same policy. But still getting error while accessing the directory in hdfs. I think, the policy is not enforced. In the updated policy, there is no exclude condition, only the user jkris03 is allowed for the permission. Also, please look at the audits for the same service but the source is different. However, for the source datameer also, the ranger policy is not working, but we can see the audits. But, for my source, It is not displaying the audits too.
... View more
03-22-2017
01:33 PM
Hi Deepak, Thanks for your reply. But, the parameter has already been configured. For your infor, for hive plugin, it works well(audit source is solr). But, for hdfs , i can see the log in "Admin" tab if i update the policy and the "Plugin" tab says the policy is synced. But in the "Access" tab, i am not seeing any audits. Note that, for the same service(hdfs) , I am seeing audits for other source but not mine(/tmp/ranger_test) .
... View more
03-22-2017
12:57 AM
Hi vperiasamy and Namit Maheshwari, Thank you very much for your reply. We are using HDP 2.5.3. I am not seeing any error. The interesting thing is, jkris03 is the owner of the directory with permission 400 in HDFS. But, when i tried to copy a file to the directory, it gave error like permission denied. But, the user rkurumb(my colleague) could able to copy file to the directory and i checked with other user(ftam) and he also got permission denied error. Since, there is no audit log, i could not see whether the ranger acl or hadoop acl is being enforced. We use Active Directory and it is synced with Apache Ranger. The group name i have used here is of Active Directory.
... View more
03-21-2017
08:30 PM
Hi there, In my work place, HDFS plugin for Ranger has been enabled. I created a policy for the source in hdfs /tmp/ranger_test(which has access permission 400 in HDFS). I can see that the policy has been synced in ranger Plugins tab. But, It is not showing up any audit logs and it does not enforce the ranger policy while accessing the directory in hdfs. For your information, the audit log is enabled. Don't know the reason why it does not work in the way i expected. No audit logs are displayed in ranger Audit tab. But audit are enable for hdfs to solr. Please let me know what would be the reason and how to troubleshoot it. Thanks, kJ
... View more
Labels:
- Labels:
-
Apache Ranger
03-09-2017
06:07 PM
Hi Braian, Thanks for the reply. The Git repository is going to be in the same
place like mentioned in the datameer documentation. But i just want to
add a remote repositoy for the local repository(which is in
/opt/datameer/current/versioning) by running the below command,
1.git remote add origin http://xxxxxxxxxxx/datameer_test.git so we can run the below command, 2.git push -u origin master Note: datameer_test.git is a repo in (Bitbucket) when i tried the above command(1), it gives me error like fatal: Not a git repository (or any parent up to mount point /apps/xfer) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). Note that, I am sure there is a .git directory in the local repository.
... View more
03-07-2017
07:54 PM
Hi there, In my working place, We have Datameer server Installed and it uses Git-Plugin. The Git repository has been installed in the Datameer server. Because
as mentioned in the Datameer documentation, (To use this Git plug-in, you need to install Git on the same
machine as the Datameer server. The Datameer service depends on the Git
repository for constant writes, so a low response time is necessary. As a
result, the repository folder needs to be locally set up on Datameer
server and cannot be remote) the git repository cannot
be locally. This is the link for the documentation. https://www.datameer.com/documentation/display/DAS50/Using+the+Git+Versioning+Plug-in But, our organization uses BitBucket. So, how to create a Repository
in BitBucket and sync the Datameer-server's repo with BitBucket repo.
Because, if the datameer server goes down, we can still have the
Bitbucket repo. Also,
keeping the repository history indatameer-sever may affect the server
performance. I would appreciate If you share your knowledge on this. Thanks, Jee
... View more
- Tags:
- data-processing
03-06-2017
01:08 AM
1 Kudo
Hi therem
I am pretty new to Datameer. I just want to Know that
What is Datameer? How Datameer is related to Hadoop? How Git is used with Datameer?
Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
03-01-2017
04:59 PM
Hi there, I know how to Kafka broker configuration properties in Ambari. But, Not sure, How to check Consumer properties in Amabari. I would appreciate your reply on this. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Kafka
02-28-2017
08:52 PM
Hi there,
I am just executing some KafkaTestCases on console. Zookeeper and Kafka brokers are started(I am seeing this in Ambari). There are 3 Zookeepers and 2 Kafka-Brokers are running. I ran "bin/kafka-console-producer.sh --broker-list EdgeNodeAddress:xxxx --topic test21"
Note: EdgeNodeAddress is the console where i am issuing the producer console command
and
I ran " bin/kafka-console-consumer.sh --zookeeper zookeeperaddress:2181 --topic test21 --from-beginning"
As soon as i ran the Kafka-Console-Consumer.sh script, It keeps on generating message like {metadata.broker.list=xxxxxxx2.xxxx:7903,xxxxxxxx2.xxxxxx:7903, request.timeout.ms=30000, client.id=console-consumer-1927, security.protocol=PLAINTEXT}
This message is keeps on generating until i stop it myself. It goes on like "while loop without meeting condition"
I troubleshooted zookeeper zkcli.sh to check if i could see the broker list(ls /brokers/ids) and it shows the broker ids [0,1]
I am not sure what is the issue. Please share your knowledge on this ASAP. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Kafka
02-28-2017
03:38 PM
Hi there, I ran sqoop command to list-databases and I am getting error. This is the sqoop command i ran: sqoop list-databases --connect jdbc:netezza://xxxxxxxx/ --username xxxx --password xxx; The error I am getting is : ERROR manager.SqlManager: Generic SqlManager.listDatabases() not implemented. For your Information, I added Netezza jar like export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:<path to the jar> Please share your knowledge on this. Thanks.
... View more
Labels:
- Labels:
-
Apache Sqoop
02-27-2017
09:06 PM
Hi there, I am running kafka on console for testing purpose. While executing kafka-consumer.sh, I am getting a message continuously without stopping like below: {metadata.broker.list=xxxxxxxv.xxxxxxx.com:7xx3,dpxxxxxxx.bxxxx.com:7xxx, request.timeout.ms=30000, client.id=console-consumer-xx36x, security.protocol=PLAINTEXT}
metadata.broker.list=xxxxxxxv.xxxxxxx.com:7xx3,dpxxxxxxx.bxxxx.com:7xxx, request.timeout.ms=30000, client.id=console-consumer-xx36x, security.protocol=PLAINTEXT}
metadata.broker.list=xxxxxxxv.xxxxxxx.com:7xx3,dpxxxxxxx.bxxxx.com:7xxx, request.timeout.ms=30000, client.id=console-consumer-xx36x, security.protocol=PLAINTEXT}
This message keeps on going until i stop this. Please share your knowledge on this to fix this issue. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Kafka
02-24-2017
08:14 PM
Hi there, I am running sqoop to import data from Netezza to HDFS. I am getting error like below, NzSQLException: LDAP authentication failed for user 'JKRIS03' Please help me how to overcome this issue. @Frank Lu @Benjamin Leonhardi @ @fra
... View more
Labels:
- Labels:
-
Apache Sqoop
02-24-2017
07:55 PM
Hi Sindhu, Thanks for your reply. My lead has asked me that I can simply use $ HADOOP_CLASSPATH=<path to nzjdbc.jar>. But, I don't know this way. please suggest me if you know. Thanks.
... View more
02-24-2017
07:39 PM
Hi there, I am running sqoop cmd to connect to netezza database. so i need to add the connector jar. In my workplace, they don't want me to put in sqoop/lib dir. So, what are the other way i can add the jar so that sqoop can use it. Now i am getting error like couldn't find the connector. The connertor in not available in sqoop/lib dir. I don't have permission to the sqoop/lib dir. Please let me know ASAP. Thanks
... View more
Labels:
- Labels:
-
Apache Sqoop
02-24-2017
07:13 PM
The exact same command like you have mentioned.
... View more