Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3602 | 08-25-2017 03:09 PM | |
| 2505 | 08-22-2017 06:52 PM | |
| 4195 | 08-09-2017 01:10 PM | |
| 8972 | 08-04-2017 02:34 PM | |
| 8946 | 08-01-2017 11:35 AM |
09-27-2016
01:17 PM
This produces the results you want: RAW = LOAD 'filepath' USING PigStorage(';') as
(Employee:Chararray, Stock:Int, Furnisher:Chararray, Date:Chararray, Value:Double);
RANKING = rank RAW BY Employee, Date DENSE;
GRP = GROUP RANKING BY $0;
SUMMED = foreach GRP {
summed = SUM(RANKING.Value);
generate $0, summed as Ranksum;
}
JOINED = join RANKING by $0, SUMMED by $0;
FINAL= foreach JOINED generate $0, Employee, Stock, Furnisher, Date, Ranksum;
STORE FINAL INTO 'destinationpath' USING PigStorage(','); Let me know this is what you are looking for by accepting the answer. If I did not get the requirements correct, please clarify.
... View more
09-27-2016
12:27 AM
Hmm. Interesting. Downloaded latest version of sandbox 2.5 GA and now there simply are no contents [root@sandbox scripts]# ls -l /var/lib/ambari-agent/cache/custom_actions/scripts/
total 0 [root@sandbox scripts]# sandbox-version
Sandbox information:
Created on: 13_09_2016_11_17_36 for
Hadoop stack version: Hadoop 2.7.3.2.5.0.0-1245
Ambari Version: 2.4.0.0-1225
Ambari Hash: 59175b7aa1ddb74b85551c632e3ce42fed8f0c85
Ambari build: Release : 1225
Java version: 1.8.0_101
OS Version: CentOS release 6.8 (Final) I will contact sandbox SMEs to communicate issue.
... View more
09-26-2016
12:59 PM
5 Kudos
You need to FLATTEN your nested data Your grouped data set has (is a bag of) fields, tuples, and bags. You need to extract the fields from the bags and tuples using the FLATTEN operator. Each of you grouped records can be seen as follows: 1; -- field
(7287026502032012,18); -- tuple
{(706)}; -- bag
{(101200010)}; -- bag
{(17286)}; -- bag
{(oz)}; -- bag
2.5 -- field Using FLATTEN with the tuple is simple but using it with a bag is more complicated. Flattening tuples To look at only tuples, let's assume your data looked like this: 1; -- field
(7287026502032012,18); -- bag Then you would use: data_flattened = FOREACH data GENERATE
$0,
FLATTEN $1; which for the data above would produce 1; 7287026502032012; 18 Flattening bags Flattening bags is more complicated, because it flattens them to tuples but cross joins them with the other data in your GENERATE statement. From the Apache Pig docs For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e). Using Pig's builtin function BagToTuple() to help you out Pig has a builtin function BagToTuple() which as it says converts a bag to a tuple. By converting your bags to tuples, you can then easily flatten them as above. Final code Your final code will look like this: data_flattened = FOREACH data GENERATE
$0,
FLATTEN $1,
FLATTEN(BagToTuple($2)),
FLATTEN(BagToTuple($3)),
FLATTEN(BagToTuple($4)),
FLATTEN(BagToTuple($5)),
$6; to produce your desired data. Useful links: https://pig.apache.org/docs/r0.10.0/basic.html#flatten
http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach
https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/builtin/BagToTuple.html If this answers your question, let me know by accepting the answer. Else, let me know the gaps or issues that are remaining.
... View more
09-23-2016
03:28 PM
Yes, I tried these .. should have put in description [root@sandbox ~]# ls -ld /var/lib/ambari-agent/cache/custom_actions/scripts/
drwxrwxrwx 1 root root 4096 Sep 22 18:55 /var/lib/ambari-agent/cache/custom_actions/scripts/ [root@sandbox ~]# chmod -R a+rX /var/lib/ambari-agent/cache/custom_actions/scripts/
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/check_host.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/check_host.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/clear_repocache.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/clear_repocache.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/install_packages.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/install_packages.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/remove_bits.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/remove_bits.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_execute_tasks.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_execute_tasks.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_set_all.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/ru_set_all.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/update_repo.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/update_repo.pyo': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/validate_configs.py': No such file or directory
chmod: cannot access `/var/lib/ambari-agent/cache/custom_actions/scripts/validate_configs.pyo': No such file or directory
... View more
09-23-2016
12:36 PM
3 Kudos
This is a great guide to what gets installed where on HDP: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html You will notice that Kafka should be installed within the cluster and is best dedicated to its own nodes. As a side note, Hortwonworks Data Flow (HDF) is a separate distribution/product provided by Hortonworks. It packages Kafka along with NiFi, Storm and Ambari and excels at acquiring, inspecting, routing, transforming, analyizing data in motion from a diverse number of sources (ranging from sensors to databases), which is typically outputted in Hadoop. Exciting technology and a lot to talk ... check it out: http://hortonworks.com/products/data-center/hdf/
... View more
09-23-2016
12:10 PM
Not sure which instructions you are using, but make sure these were followed: https://community.hortonworks.com/articles/34424/apache-zeppelin-on-hdp-242.html If not followed, suggest uninstalling Zeppelin and reinstalling with the steps shown in the link. Also, consider upgrading to HDP 2.5. Zeppelin is GA in this version (not Technical Preview) and the install is 100% from the Ambari UI. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_zeppelin-component-guide/content/ch_installation.html
... View more
09-23-2016
12:01 PM
This is a good discussion on setting reducers: https://community.hortonworks.com/questions/28073/how-do-you-force-the-number-of-reducers-in-a-map-r.html As with all performance tuning, best to isolate a bottleneck and tune that vs. simply trying a lot of things at the same time. So yes, among other tuning ... set this and see if it works. If not, move to the next suspected bottleneck.
... View more
09-23-2016
01:44 AM
2 Kudos
Note: This is in sandbox.
Simple workflow:
log into Ranger
navigate to Ranger service (Ranger Admin, Ranger Usersync, Ranger Tagsyncs are all running)
navigate to configs
click "Test Connections
Result: Connection failed
stderr says:
/usr/bin/python: can't open file '/var/lib/ambari-agent/cache/custom_actions/scripts/check_host.py': [Errno 2] No such file or directory
When I ssh into the ranger host (sandbox) as root, and run ls -l /var/lib/ambari-agent/cache/custom_actions/scripts/ I get the following result (be sure to scroll to bottom):
ls: cannot access scripts/check_host.py: No such file or directory
ls: cannot access scripts/check_host.pyo: No such file or directory
ls: cannot access scripts/clear_repocache.py: No such file or directory
ls: cannot access scripts/clear_repocache.pyo: No such file or directory
ls: cannot access scripts/install_packages.py: No such file or directory
ls: cannot access scripts/install_packages.pyo: No such file or directory
ls: cannot access scripts/remove_bits.py: No such file or directory
ls: cannot access scripts/remove_bits.pyo: No such file or directory
ls: cannot access scripts/ru_execute_tasks.py: No such file or directory
ls: cannot access scripts/ru_execute_tasks.pyo: No such file or directory
ls: cannot access scripts/ru_set_all.py: No such file or directory
ls: cannot access scripts/ru_set_all.pyo: No such file or directory
ls: cannot access scripts/update_repo.py: No such file or directory
ls: cannot access scripts/update_repo.pyo: No such file or directory
ls: cannot access scripts/validate_configs.py: No such file or directory
ls: cannot access scripts/validate_configs.pyo: No such file or directory
total 0
?????????? ? ? ? ? ? check_host.py
?????????? ? ? ? ? ? check_host.pyo
?????????? ? ? ? ? ? clear_repocache.py
?????????? ? ? ? ? ? clear_repocache.pyo
?????????? ? ? ? ? ? install_packages.py
?????????? ? ? ? ? ? install_packages.pyo
?????????? ? ? ? ? ? remove_bits.py
?????????? ? ? ? ? ? remove_bits.pyo
?????????? ? ? ? ? ? ru_execute_tasks.py
?????????? ? ? ? ? ? ru_execute_tasks.pyo
?????????? ? ? ? ? ? ru_set_all.py
?????????? ? ? ? ? ? ru_set_all.pyo
?????????? ? ? ? ? ? update_repo.py
?????????? ? ? ? ? ? update_repo.pyo
?????????? ? ? ? ? ? validate_configs.py
?????????? ? ? ? ? ? validate_configs.pyo
Any idea what is going on?
... View more
Labels:
- Labels:
-
Apache Ranger
09-22-2016
10:27 AM
Release notes state that Hive 2.1 is available in HDP 2.5 as tech preview. LLAP is part of the tech preview http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/tech_previews.html Release email to customers explain the same in a more readable way: Apache Hive
Includes Apache Hive 1.2.1 for production and Hive 2.1 (Technical Preview) for cutting-edge performance Hive LLAP (Technical Preview): Persistent query servers and optimized in-memory caching for blazing fast SQL. Up to 25x faster for BI workloads. 100% compatible with existing Hive workloads Hive ACID and Streaming Ingest certified for production use with Hive 1.2.1 Dynamic user-based security policies for data masking and filtering HPL/SQL: Procedural programming within Hive Hive View v1.5.0, improved robustness and security Parquet format fully certified with Hive 1.2.1 / 2.1
... View more