Member since
09-23-2015
800
Posts
898
Kudos Received
185
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7563 | 08-12-2016 01:02 PM | |
| 2763 | 08-08-2016 10:00 AM | |
| 3776 | 08-03-2016 04:44 PM | |
| 7354 | 08-03-2016 02:53 PM | |
| 1903 | 08-01-2016 02:38 PM |
03-08-2016
03:05 PM
AFAIK you can create multiple flume agent configurations on different nodes uses host groups. You can also merge config files in one agent together but I don't think there is any way to manage multiple flume agents on the same host in ambari.
... View more
03-08-2016
12:29 PM
1 Kudo
Spark History Server is using Timeline Server under the cover. Can you check in Ambari if ATS is running and working correctly? Other tools that use Timeline Server are the Tez view for example, so you could try that one to see if Timeline server is working in general.
... View more
03-08-2016
11:17 AM
Can you share your workflow? Did you add the Hive2Credential in the action? Actually not completely sure if needed in a Hive2 Action but I assume so. He gets the information like keytab from the hive-site.xml https://oozie.apache.org/docs/4.2.0/DG_ActionAuthentication.html Does the same URL work from beeline?
... View more
03-03-2016
09:01 PM
2 Kudos
Remove the new line characters in the end. I.e. merge the last three lines. I know it sounds stupid but Hive seems to have a bug sometimes that throws EOF errors. I could run the following: CREATE EXTERNAL TABLE IF NOT EXISTS tablename( SOURCE_ID VARCHAR(30) , SOURCE_ID_TYPE VARCHAR(30) , SOURCE_NAME VARCHAR(30) , DEVICE_ID_1 VARCHAR(30) , DEVICE_ID_TYPE_1 VARCHAR(30) , DEVICE_ID_2 VARCHAR(30) , DEVICE_ID_TYPE_2 VARCHAR(30) , EVENT_TYPE VARCHAR(30) , EVENT_NAME VARCHAR(100) , EVENT_IDENTIFIER VARCHAR(30) , OCCURRENCE_TIME TIMESTAMP , DETECTION_TIME TIMESTAMP , REQUEST_ID VARCHAR(256) , TRANSACTION_ID VARCHAR(100) , HOSTNAME VARCHAR(30) , CATEGORY VARCHAR(30) , CHANNEL_LANG_TYPE VARCHAR(30) , ACCESS_ID VARCHAR(30) , ACCESS_TYPE VARCHAR(30) , MULTI_FACTOR_AUTHENTICATION_INDICATOR_1 VARCHAR(30) , INVOLVED_PARTY_ID_1 VARCHAR(30) , CARD_NUMBER_1 VARCHAR(30) , INVOLVED_PARTY_TYPE_1 VARCHAR(30) , CARD_ACCESS_FACILITY_ARRANGEMENT_TYPE_1 VARCHAR(30) , INVOLVED_PARTY_ROLE_TYPE_1 VARCHAR(30) , AUTHENTICATION_TYPE_1 VARCHAR(30) , INVOLVED_PARTY_ACTIVE_DIRECTORY_ID_1 VARCHAR(30) , MULTI_FACTOR_AUTHENTICATION_INDICATOR_2 VARCHAR(30) , INVOLVED_PARTY_ID_2 VARCHAR(30) , CARD_NUMBER_2 VARCHAR(30) , INVOLVED_PARTY_TYPE_2 VARCHAR(30) , CARD_ACCESS_FACILITY_ARRANGEMENT_TYPE_2 VARCHAR(30) , INVOLVED_PARTY_ROLE_TYPE_2 VARCHAR(30) , AUTHENTICATION_TYPE_2 VARCHAR(30) , INVOLVED_PARTY_ACTIVE_DIRECTORY_ID_2 VARCHAR(30) , CARD_NUMBER VARCHAR(30) , CUSTOMER_ID VARCHAR(30) , CUSTOMER_NAME VARCHAR(100) , CREDENTIAL_TYPE VARCHAR(30) , AUTHENTICATION_METHOD_TYPE VARCHAR(30) , AUTHENTICATION_TIMESTAMP TIMESTAMP , ELECTRONIC_DELIVERY_DEVICE_TYPE VARCHAR(30) , ELECTRONIC_DELIVERY_DEVICE_ID VARCHAR(30) , OPERATOR_ID VARCHAR(8) , OPERATOR_NAME VARCHAR(30) , BRANCH_NUMBER VARCHAR(8) , SOURCE_SYSTEM_ID VARCHAR(30) , SOURCE_SYSTEM_CODE VARCHAR(30) , SOURCE_SYSTEM_NAME VARCHAR(60) , RANK_NUMBER VARCHAR(30)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE location 'hdfs:///user/test';
... View more
03-02-2016
04:20 PM
1 Kudo
Depends what you want to do. By itself Hadoop doesn't support any sentiment analysis. So you need to use a sentiment analytics package. HAdoop is mostly written on Java so pretty much all Java packages will work. Java itself handles strings as UTF so arabic is supported by itself. The biggest ones are Stanford NLP OpenNLP and Gate From a quick google search both gate and stanford support some arabic features: https://gate.ac.uk/gate/plugins/Lang_Arabic/src/arabic/ http://nlp.stanford.edu/projects/arabic.shtml If you want to run these packages in hadoop you will have to decide if you want to run them in - MapReduce - as pig udfs perhaps - Spark ( Hadoop Streaming and Spark also support python, so you could use nltk but I would suggest Java )
... View more
03-02-2016
09:37 AM
1 Kudo
If you shutdown the OS all tasks running on that node will be stopped too so you don't need to worry about recovery. You might kill the running application masters on that node though. There is no graceful shutdown of a nodemanager that waits for running applications to finish as of yet ( AFAIK if someone knows better let me know ). Yarn depends on applications to handle task or AM failures gracefully. https://issues.apache.org/jira/browse/YARN-914
... View more
03-02-2016
09:25 AM
1 Kudo
The Nodemanager restart does not recreate the containers. It reattaches to existing containers that are still running. I.e. when a nodemanager is restarted, the server may not have been rebooted but just the nodemanager process. Instead of shooting down all containers and starting fresh he can reattach to the still running containers and therefore has less impact on running applications. Especially good for long running applications like Spark Streaming and Application Masters. So SLAs shouldn't be affected. If for example the whole node goes down. MapReduce and Tez will still see dead containers and application masters and recreate as necessary. yarn recovery has no impact on that. http://hortonworks.com/blog/resilience-of-yarn-applications-across-nodemanager-restarts/
... View more
03-01-2016
09:42 AM
1 Kudo
I don't think there is a Pig Storage handler that does that. Which is a bit weird I suppose. How did you generate that file? Just test data you did manually? PigStorage essentially reads writes delimited files, tuples can be Maps/bags but I don't think the main record can be. JsonStorage is Json format which is different syntax. Then there is BinStorage which I suppose is some kind of Sequence file. I might just not see that but I think there is no way in Pig natively without some transformations to read data in the format he prints it on for debugging. Please someone correct me if I am wrong. http://pig.apache.org/docs/r0.14.0/func.html#load-store-functions
... View more
03-01-2016
09:06 AM
3 Kudos
I don't think so. there is no capture output attribute in the sqoop action XML. However you could run a sqoop command in a shell or ssh action ( the latter I did the first might need the sqoop libraries added which might be more setup). You could then capture the output of the shell or ssh action. Essentially run sqoop in a shellscript and grep the output you want and put it into key=value pairs. Those you can then use in the following action eval for example.
... View more
03-01-2016
12:29 AM
Now people with more knowledge about TDE might correct me but I don't see anything about cp/mv in/out of an encrypted zone only get/put. Of course you can use MapReduce to read from it and write somewhere else to have non-encrypted or write in an encrypted zone to have encrypted data but not sure if you can use a simple hadoop fs -cp or -mv to do the same. Anybody with experience in TDE knows? What you can do is use the hidden folder /.reserved/raw/ to access the encrypted data directly. To for example copy it to a backup server without having to en/decrypt anything. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/config-use-hdfs-encr.html#copy-to-from-encr-zone "To retain this workflow when using HDFS encryption, a new virtual path prefix has been introduced, /.reserved/raw/. This virtual path gives super users direct access to the underlying encrypted block data in the file system, allowing super users to distcp data without requiring access to encryption keys. This also avoids the overhead of decrypting and re-encrypting data. The source and destination data will be byte-for-byte identical, which would not be true if the data were re-encrypted with a new EDEK."
... View more