Member since
12-10-2014
22
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4229 | 12-11-2014 02:59 AM |
11-26-2015
06:42 PM
1 Kudo
Hi, I'm trying to setup a continuous delivery practice for my Cloudera installation and I'm hoping for some pearl's of wisdom. Do you do automated testing for your ingestion development? If you do, how do you do it? If you don't, was there something that stopped you? Ideally, I'll be creating an environment where unit testing kicks off on demand by the developer from an integrated development environment; and integration testing kicks off from a continuous delivery product based on source control check ins and schedules. Thanks, Ty
... View more
11-25-2015
08:44 PM
Hi, I'm trying to setup a continuous delivery practice for my Cloudera installation and I'm hoping for some pearl's of wisdom. Do you do automated testing of your workflow's? If you do, how do you do it? If you don't, was there something that stopped you? Ideally, I'll be creating an environment where unit testing kicks off on demand by the developer from an integrated development environment; and integration testing kicks off from a continuous delivery product based on source control check ins and schedules. Thanks, Ty
... View more
Labels:
11-25-2015
02:46 PM
Hi, I'm not entirely sure if this question should be here or in the getting started board. If I'm in the wrong place please let me know. Anyway, my organisation has Cloudera as an "innovation" deployment where everyone develops against a single instance of most of the components of the Cloudera stack. There's a lot of manual activity to get software written, tested and artefacts prepared for potential deployment. Naturally, there's a desire to shift to a continuous delivery method as we grow into a "production" deployment. How do you do continuous delivery? At the moment I'm thinking of an environment where a developer has a personal development environment (i.e. Cloudera instance and integrated development environment) hydrated (and dehydrated) and configured on demand with all artefacts held in a permenant Team Foundation Server source control repository. Code is created and unit tests are run in this space: small scale and no integration. On a half day schedule the integration environment will be hydrated, configured and artefacts generated. The integration test suite kicks off and runs to produce reports for developers, which are checked back into source control. Finally the integration envirenment is dehydrated. If all the tests succeed the next level of assurance is performed in its own environment, etc; until production deployment (manual step). Is this practical/doable? I'm also interested in how you do testing and integrate source control; but I'll hold that conversation for another thread topic. Thanks, Ty
... View more
11-25-2015
01:38 PM
Hi, I'm afraid it's been a while since I did this and it's all a blur in my rear view memory now. As it happens, I'm getting back into Cloudera again and will be building some new VM's soon. If I have to do an update I'll let you know how I do it. Sorry for not having an answer for you
... View more
07-18-2015
06:43 AM
Thanks Sean. I appreciate the guidance. I now have Vmware ESXi setup and an ubuntu 14.04 server guest. I'm going through "Installation Path A - Automted Installation by Cloudera Manager". All going well so far.
... View more
07-16-2015
11:20 PM
Hi, I've got the quickstart vm up and running and working pretty well and I'd like to test out a few of the Cloudera administrative tasks using manager. I'm wondering if anyone has successfully deployed additional nodes from the quickstart VM and gotten a cluster (virtual or physical) up and running? Thanks Ty
... View more
02-18-2015
12:56 PM
Thanks. You're a life saver. The cluster now reports as 5.3.1. I guess one thing to learn from this is that it may be worth updating http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/install_upgrade_cdh_maintenance_packages.html to extend the list of example unmanaged services from Mahout, Pig, Whirr to Parquet, HBase, Zookeeper, Solr, Mahout, Pig, Whirr.
... View more
02-18-2015
05:30 AM
1 Kudo
I have the CDH 5.3.0 virtual machine and I'd like to upgrade it to the latest 5.3.1 release in the hope that it will resolve a few configuration issues I have run into. Have followed the instructions here: upgrade CDM & Java: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ag_upgrade_cm5.html upgrade CDH: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/install_upgrade_cdh_maintenance_packages.html Unfortunately I end up with the screen in the upgrade wizard at step 6 (Cloudera Manager checks that hosts have the correct software installed. If the packages have not been installed, a warning displays to that effect. Install the packages and click Check Again . When there are no errors, click Continue .): Upgrade the CDH running on Cloudera QuickStart Host Software Detection Checking if the relevant agents are responsive, the relevent hosts are healthy and have the correct software installed on them. Multiple releases detected: [CDH 5.3.1, CDH 5.3.0] The following host(s), quickstart.cloudera, do not have the newly required CDH package installed. They still have the incorrect CDH 5.3.0. It doesn't seem right to me since my Host Inspector looks like everything is fine. Output is this: Inspector Results Validations Inspector ran on all 1 hosts. The following failures were observed in checking hostnames... No errors were found while looking for conflicting init scripts. No errors were found while checking /etc/hosts. All hosts resolved localhost to 127.0.0.1. All hosts checked resolved each other's hostnames correctly and in a timely manner. Host clocks are approximately in sync (within ten minutes). Host time zones are consistent across the cluster. No users or groups are missing. No conflicts detected between packages and parcels. No kernel versions that are known to be bad are running. Cloudera recommends setting /proc/sys/vm/swappiness to 0. Current setting is 1. Use the sysctl command to change this setting at runtime and edit /etc/sysctl.conf for this setting to be saved after a reboot. You may continue with installation, but you may run into issues with Cloudera Manager reporting that your hosts are unhealthy because they are swapping. The following hosts are affected: No performance concerns with Transparent Huge Pages settings. CDH 5 Hue Python version dependency is satisfied. 0 hosts are running CDH 4 and 1 hosts are running CDH5. All checked hosts in each cluster are running the same version of components. All managed hosts have consistent versions of Java. All checked Cloudera Management Daemons versions are consistent with the server. All checked Cloudera Management Agents versions are consistent with the server. Version Summary Cloudera QuickStart — CDH 5 Hosts quickstart.cloudera Component Version Release CDH Version Parquet 1.5.0+cdh5.3.0+52 1.cdh5.3.0.p0.27 CDH 5 Impala 2.1.1+cdh5.3.1+0 1.cdh5.3.1.p0.17 CDH 5 YARN 2.5.0+cdh5.3.1+791 1.cdh5.3.1.p0.17 CDH 5 HDFS 2.5.0+cdh5.3.1+791 1.cdh5.3.1.p0.17 CDH 5 hue-common 3.7.0+cdh5.3.1+135 1.cdh5.3.1.p0.17 CDH 5 Sqoop2 1.99.4+cdh5.3.1+20 1.cdh5.3.1.p0.17 CDH 5 kms Unavailable Unavailable Not installed or path incorrect HBase 0.98.6+cdh5.3.0+73 1.cdh5.3.0.p0.25 CDH 5 Sqoop 1.4.5+cdh5.3.1+61 1.cdh5.3.1.p0.17 CDH 5 Oozie 4.0.0+cdh5.3.1+335 1.cdh5.3.1.p0.17 CDH 5 Zookeeper 3.4.5+cdh5.3.0+81 1.cdh5.3.0.p0.36 CDH 5 Hue 3.7.0+cdh5.3.1+135 1.cdh5.3.1.p0.17 CDH 5 spark 1.2.0+cdh5.3.1+365 1.cdh5.3.1.p0.17 CDH 5 MapReduce 1 2.5.0+cdh5.3.1+791 1.cdh5.3.1.p0.17 CDH 5 Pig 0.12.0+cdh5.3.1+47 1.cdh5.3.1.p0.17 CDH 5 Crunch (CDH 5 only) 0.11.0+cdh5.3.1+17 1.cdh5.3.1.p0.17 CDH 5 Llama (CDH 5 only) 1.0.0+cdh5.3.1+0 1.cdh5.3.1.p0.17 CDH 5 HttpFS 2.5.0+cdh5.3.1+791 1.cdh5.3.1.p0.17 CDH 5 Hadoop 2.5.0+cdh5.3.1+791 1.cdh5.3.1.p0.17 CDH 5 Hive 0.13.1+cdh5.3.1+308 1.cdh5.3.1.p0.17 CDH 5 HCatalog 0.13.1+cdh5.3.1+308 1.cdh5.3.1.p0.17 CDH 5 sentry 1.4.0+cdh5.3.1+127 1.cdh5.3.1.p0.17 CDH 5 MapReduce 2 2.5.0+cdh5.3.1+791 1.cdh5.3.1.p0.17 CDH 5 Lily HBase Indexer 1.5+cdh5.3.1+24 1.cdh5.3.1.p0.17 CDH 5 Solr 4.4.0+cdh5.3.0+314 1.cdh5.3.0.p0.25 CDH 5 Flume NG 1.5.0+cdh5.3.1+80 1.cdh5.3.1.p0.19 CDH 5 Cloudera Manager Management Daemons 5.3.1 1.cm531.p0.191 Not applicable Java 6 JAVA_HOME=/usr/java/jdk1.6.0_31 java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Unavailable Not applicable Java 7 JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) Unavailable Not applicable Java 8 Unavailable Unavailable Not applicable Cloudera Manager Agent 5.3.1 1.cm531.p0.191.el6 Not applicable Once again, I'm grateful for any assistance. Thanks, Ty
... View more
- Tags:
- upgrade
02-18-2015
04:41 AM
ok, no lucj with the upgrade path. Can't manage to get the upgrade wizard to get past the 2nd screen... I think it's time I give up on this software...
... View more
02-18-2015
02:46 AM
So these are the pre-requisites: 1. CDH 5.3.0 (or later) managed by Cloudera Manager 5.3.0 (or later) 2. (Strongly Recommended) Implement Kerberos authentication on your cluster. The following conditions must be also be true when enabling Sentry-HDFS synchronization. Failure to comply with any of these will result in validation errors. 3. You must use the Sentry service, not policy file-based authorization. 4. Enabling HDFS Extended Access Control Lists (ACLs) is required. 5. There must be exactly one Sentry service dependent on HDFS. 6. The Sentry service must have exactly one Sentry Server role. 7. The Sentry service must have exactly one dependent Hive service. 8. The Hive service must have exactly one Hive Metastore role (that is, High Availability should not be enabled). Do I have them? 1. yes 2. yes 3. yes: as per http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_sentry_service_config.html cloudera manager instructions 4. yes: as per http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_sg_hdfs_ext_acls.html?scroll=xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--76a9 (although there is a cloudera manager configuration parameter for doing this) 5. yes: I think this is the case for the VM. I looked at the roles and instances and there seems to only be 1 sentry service 6. yes: again I think the VM is setup this way. I looked around and it seemed to be the case 7. yes: as above 8. yes: as above As far as following the instructions goes; it seemed to me that be the time I got to this part of the security configuration he system didn't need much changing. Unfortunately it doesn't seem to work for me. Can you tell me what jar I should be looking for on my system so that I can be certain the installation is ok? Thanks. I'm doing an upgrade at the moment. The VM is 5.3.0 and I'm hoping that 5.3.1 will give me a better experience.
... View more
02-18-2015
02:14 AM
I'm not sure I'm heading down the right path here but after a bunch of Googling it seems that the HDFS ACL sync functionality is implemented in the sentry-hdfs-namenode-plugin-1.4.0-cdh5.3.1.jar When I search my CDH5.3 virtual machine the jar isn't found. Could there be a problem with the virtual machine configuration?
... View more
02-17-2015
08:35 PM
Hi, I have a clean CDH5.3 virtual machine which I am trying to get Sentry working on. I've following the authentication and authorization instructions and gotten to the point where I want to enable Sentry automatic syncing of HDFS ACL's. Following the documentation has all the prerequisites taken care of and it seems all I need to do is: Under the Service-Wide category go to Security. Check the Enable Sentry Synchronization checkbox. When I do this the HDFS NameNode won't start and the log indicates that thre is a authorization provider class that needs to be configured: Failed to start namenode.
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.sentry.hdfs.SentryAuthorizationProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2079)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1089)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:621)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:607)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:754)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:738)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1427)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1493)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.sentry.hdfs.SentryAuthorizationProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2071)
... 7 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.sentry.hdfs.SentryAuthorizationProvider not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
... 8 more I'm unsure if this is really the problem and, if it is, how to resolve it. I did notice that in the instructions for setting up syncing from the CLI there is this section added to the hdfs-site.xml: <property>
<name>dfs.namenode.authorization.provider.class</name>
<value>org.apache.sentry.hdfs.SentryAuthorizationProvider</value>
</property> Any assistance would really make the world of difference to my day. Thanks, Ty
... View more
02-17-2015
04:44 AM
Thanks for the link to your kerberos bootstrap'er. It seems to work for me. Unfortunately I ran into a secondary problem and I'm not sure how to let Cloudera know about it. On a clean CDH5.3 virtual machine I started Cloudera Manager and then ran your kerberos bootstrapper. Then I ran the Kerberos configuration wizard in Cloudera Manager. The restart failed to complete successfully. It seemed like the Yarn NameNode was having trouble with the topology.map amd container-executer.cfg file permissions in the /var/run/cloudera-scm-agent/process/??-yarn-NODEMANAGER/ directory (note: ?? is a number generated each time I tried to restart the NodeManager). On further inspection, and a couple snapshot reverts, I think I found the real problem. the /etc and /etc/hadoop directories have group write permissions set. This was identified in the NodeManager logs. I started again and changed the permissions to remove the group write permission on both directories - before running the Cloudera Manager Kerberos configuration wizard. This time it seems to have worked. note: I hve not yet had a chance to test kerberos actually doing anything from a hdfs/hive/pig user perspective. Only one last glitch in the system. It looks like Spark does not have Kerberos credentials created for it. The Spark History Server is showing as critical health and its log is identifying missing Kerberos credentials. When I look at the Kerberos Credentials screen in Cloudera Manager I see credentials for all the services excpt Spark. I'm not doing anything with Spark and I don't know anything about it so I'll just stop the service for now. I am not sure if changing the directory permissions on /etc and /etc/hadoop will adversely affect other functions; but I hope my little investigation can help others.
... View more
02-16-2015
09:39 PM
ok, I posted too soon. I seem to have solved it. I addedd all the key algorithms that kadmin.local listed when I did a get_principal on the cloudera-scm/admin principal. Restarting the cluster now...
... View more
02-16-2015
09:30 PM
I'm afraid it's not smooth sailing on this one. I found the github project here: https://github.com/esammer/krb-bootstrap It all seems to work ok. I seem to get Kerberos and a realm (CLOUDERA) and a principal (cloudera-scm/admin). After some searching I managed to set the password for cloudera-scm/admin usinf the command line tool kadmin.local Unfortunately when I get to step 5 (import KDC Account Manager Credentials) of the Coudera Manager kerberos setup wizard I get the following message. I'm afraid I'm stuck again and could use some help if anyone knows how to get past this problem. /usr/share/cmf/bin/import_credentials.sh failed with exit code 1 and output of <<
+ export PATH=/usr/kerberos/bin:/usr/kerberos/sbin:/usr/lib/mit/sbin:/usr/sbin:/sbin:/usr/sbin:/bin:/usr/bin
+ PATH=/usr/kerberos/bin:/usr/kerberos/sbin:/usr/lib/mit/sbin:/usr/sbin:/sbin:/usr/sbin:/bin:/usr/bin
+ KEYTAB_OUT=/var/run/REDACTED-scm-server/cmf242896655772090475.keytab
+ USER=REDACTED-scm/admin@CLOUDERA
+ PASSWD=REDACTED
+ KVNO=1
+ SLEEP=0
+ RHEL_FILE=/etc/redhat-release
+ '[' -f /etc/redhat-release ']'
+ set +e
+ grep Tikanga /etc/redhat-release
+ '[' 1 -eq 0 ']'
+ '[' 0 -eq 0 ']'
+ grep 'CentOS release 5' /etc/redhat-release
+ '[' 1 -eq 0 ']'
+ '[' 0 -eq 0 ']'
+ grep 'Scientific Linux release 5' /etc/redhat-release
+ '[' 1 -eq 0 ']'
+ set -e
+ '[' -z /etc/krb5.conf ']'
+ echo 'Using custom config path '\''/etc/krb5.conf'\'', contents below:'
+ cat /etc/krb5.conf
+ IFS=' '
+ read -a ENC_ARR
+ for ENC in '"${ENC_ARR[@]}"'
+ echo 'addent -password -p REDACTED-scm/admin@CLOUDERA -k 1 -e des-hmac-sha1'
+ '[' 0 -eq 1 ']'
+ echo REDACTED
+ echo 'wkt /var/run/REDACTED-scm-server/cmf242896655772090475.keytab'
+ ktutil
+ chmod 600 /var/run/REDACTED-scm-server/cmf242896655772090475.keytab
+ kinit -k -t /var/run/REDACTED-scm-server/cmf242896655772090475.keytab REDACTED-scm/admin@CLOUDERA
kinit: Key table entry not found while getting initial credentials
>>
... View more
02-16-2015
04:12 PM
Just to be 100% sure are you saying that it is not possible to implement Sentry with the virtual machine alone since it does not have any kerberos functionality inbuilt?
... View more
02-16-2015
03:29 PM
I am trying to evaluate Sentry in the CDH5.3 virtual machine provided by Cloudera. Unfortunately I am having a lot of problems getting it to even work and I throught I'd check that my assumption that I can even get it to work is correct. In this ( http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_sentry_service.html ) documentation the prereqisites say: CDH 5.1.x (or later) managed by Cloudera Manager 5.1.x (or later). See the Cloudera Manager Administration Guide and Cloudera Installation and Upgrade for instructions. HiveServer2 and the Hive Metastore running with strong authentication. For HiveServer2, strong authentication is either Kerberos or LDAP. For the Hive Metastore, only Kerberos is considered strong authentication (to override, see Securing the Hive Metastore). Impala 1.4.0 (or later) running with strong authentication. With Impala, either Kerberos or LDAP can be configured to achieve strong authentication. Implement Kerberos authentication on your cluster. For instructions, see Enabling Kerberos Authentication Using the Wizard I don't have kerberos or LDAP (since I'm in the virtual machine) so I override the HiveServer2/Hive Metastore requirement for strong authentication. The last prerequisite says I need to implement Kerberos authentication. Is this only if I want Impala to work; or will it stop Sentry from working entirely. Thanks Ty
... View more
12-15-2014
02:21 AM
I think I may have found the problem. btw, please forgive me if all this is super simple to everyone else. I'm totally new to this tech. Anyway, the output from the Cloudera Navigator RestAPI doesn't seem to be correctly formed for Pig's Json to support loading it. There are {} which need to encapsulate the array/list (I'm unsure of the correct terminology to use for json). It could be that when Firefox rendered the result of my RestAPI query that it removed the {} from around the entire result... Also, I simply did a copy and paste of the text from the browser window to a text editor. This put linefeed's everywhere to make the json readable. The Json loader for Pig doesn't like this at all and I had to go through the data and remove them. btw. I was using the elephantbird JsonLoader.
... View more
12-14-2014
05:00 PM
Hi, I'm trying to load the output from the Cloudera Navigator RestAPI with Pig. From the very little I know about Cloudera and Hadoop, this should be simple; but I can't seem to get it working. Any help is greatly appreciated. I'm using a clean Cloudera 5.2 Express VM on VirtualBox. I have Navigator access from a different system; but I'm trying to use the json file in the VM. I use the following entities query to get colum information for a table in Hive: http://hostname:7187/api/v2/entities?query=(sourceType=HIVE)AND(parentPath:\/default\/sample_07)AND(type:field) This gives me the following result which I use the file browser to upload into HDFS: [ {
"identity" : "cda810869478499e10b7277d9154d950",
"originalName" : "code",
"sourceId" : "4f3e6518363eaef52d239ff4fae31a1d",
"firstClassParentId" : "7ead74b8ce208fc211ecf7722c91667b",
"parentPath" : "/default/sample_07",
"extractorRunId" : "4f3e6518363eaef52d239ff4fae31a1d##247",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"dataType" : "string",
"originalDescription" : null,
"deleted" : false,
"type" : "FIELD",
"sourceType" : "HIVE",
"internalType" : "hv_column"
}, {
"identity" : "12440810a79c5b16bf51bc693ab7733e",
"originalName" : "description",
"sourceId" : "4f3e6518363eaef52d239ff4fae31a1d",
"firstClassParentId" : "7ead74b8ce208fc211ecf7722c91667b",
"parentPath" : "/default/sample_07",
"extractorRunId" : "4f3e6518363eaef52d239ff4fae31a1d##247",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"dataType" : "string",
"originalDescription" : null,
"deleted" : false,
"type" : "FIELD",
"sourceType" : "HIVE",
"internalType" : "hv_column"
}, {
"identity" : "6b22876ef3316d99d550f345b313c5b3",
"originalName" : "total_emp",
"sourceId" : "4f3e6518363eaef52d239ff4fae31a1d",
"firstClassParentId" : "7ead74b8ce208fc211ecf7722c91667b",
"parentPath" : "/default/sample_07",
"extractorRunId" : "4f3e6518363eaef52d239ff4fae31a1d##247",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"dataType" : "int",
"originalDescription" : null,
"deleted" : false,
"type" : "FIELD",
"sourceType" : "HIVE",
"internalType" : "hv_column"
}, {
"identity" : "ff344358cc3052a125d61cc83060795d",
"originalName" : "salary",
"sourceId" : "4f3e6518363eaef52d239ff4fae31a1d",
"firstClassParentId" : "7ead74b8ce208fc211ecf7722c91667b",
"parentPath" : "/default/sample_07",
"extractorRunId" : "4f3e6518363eaef52d239ff4fae31a1d##247",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"dataType" : "int",
"originalDescription" : null,
"deleted" : false,
"type" : "FIELD",
"sourceType" : "HIVE",
"internalType" : "hv_column"
} ] I'd like to load the dat into Pig and be able to see that it's in there i.e. describe and dump. Any suggestions or referals to tutorials/examples which should work? Thanks.
... View more
12-11-2014
03:04 AM
Also, the register commands don't seem to be necessary as long as the UDF jar's are specified as file resources.
... View more
12-11-2014
02:59 AM
All I can say is programming really gives me a headache sometimes. For those who run into the same problem as I have the solution is to type the class name with the same capitalisation as is in the source code of the UDF. The correction which gets it all working is JsonLoader instead of what I typed originally: jsonloader the correct code is: A = LOAD '/tmp/test.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
... View more
12-10-2014
07:22 PM
Hi, I am having trouble registering a UDF jar and I am hoping someone can walk me through the process. I'm a total newb with the Hdoop software stack and Cloudera. I have the Cloudera 5.2 Express VM running in VirtualBox. I have the jars in HDFS: /tmp/elephant-bird-core.4.5.jar and /tmp/elephant-bird-pig.4.5.jar I have a pig script with 2 file resources specified; one for each of the jars above My pig script looks like this: register /tmp/elephant-bird-core-4.5.jar;
register /tmp/elephant-bird-pig-4.5.jar;
A = LOAD '/tmp/test.json' USING com.twitter.elephantbird.pig.jsonloader('-nestedLoad');
describe A; My error looks like this: ERROR org.apache.pig.tools.grunt.Grunt - Error 101: file '/tmp/elephant-bird-core-4.5.jar' does not exist There are also a bunch of warning about deprecated stuff. note: I have not done any configuration of the system other than the screens which appear when the VM first boots; asking to install all the software and create a user. Any help would be greatly appreciated... I must admit I didn't think I'd get stuck at step #1 of my exploration of Cloudera and Hadoop.
... View more