Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
12090 | 02-20-2018 12:33 PM | |
1245 | 02-19-2018 05:12 AM | |
1558 | 12-28-2017 06:13 AM | |
6568 | 09-28-2017 09:25 AM | |
11332 | 09-25-2017 11:19 AM |
09-13-2017
06:04 AM
There is no specific Architect certification or exam currently. Below are available certification exams: https://hortonworks.com/services/training/certification/
... View more
08-18-2017
01:54 PM
1 Kudo
@Saurab Dahal Yes its achievable. But there are few tweeks which has to be done. Partitioned table should be created with additional field("month") along with sale_date. Create the hive table with month as partitioned column. When inserting into the table, extract only the month from sales_date and pass it to insert statement. Insert into table table_name partitioned(month) select col1,col2,MONTH(sales_date),sale_date from source_table; Above command should work. Make sure the below property is enabled. set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict Hope It helps!
... View more
08-16-2017
12:56 PM
Thanks @Shawn Weeks
... View more
08-10-2017
06:56 AM
@Greg Keys Thanks Greg. I got the information which I was looking for.
... View more
07-20-2017
07:36 AM
@Bala Vignesh N V Your problem statement can be interpreted in two ways. The first (and for me more logical) way is that a movie has multiple genres, and you want to count how many movies each genre has: genres = movies.flatMap(lambda line: line.split(',')[2].split('|'))
genres.countByValue() We map each lines into multiple output items (genres), that why we use flatMap. First, we split each line by ',' and get the 3rd column, then we split the genres by '|' and omit them. This gives you: 'Adventure': 2,
'Animation': 1,
'Children': 2,
'Comedy': 4,
'Drama': 1,
'Fantasy': 2,
'Romance': 2
Your 'SQL' query (select genres, count(*)) suggests another approach: if you want to count the combinations of genres, for example movies that are Comedy AND Romance. In that case you can simply use: genre_combinations = movies.map(lambda line: line.split(',')[2])
genre_combinations.countByValue()
This gives you: 'Adventure|Animation|Children|Comedy|Fantasy': 1,
'Adventure|Children|Fantasy': 1,
'Comedy': 1,
'Comedy|Drama|Romance': 1,
'Comedy|Romance': 1
... View more
01-03-2018
05:39 PM
@Félicien Catherin The tutorial has typo... you need to create normal table first using following sytax: CREATE TABLE FIREWALL_LOGS( time STRING, ip STRING, country STRING, status INT )
CLUSTERED BY (time) into 25 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LOCATION '/tmp/server-logs' TBLPROPERTIES("transactional"="true"); Once the above table is created you can convert it to ORC CREATE TABLE FIREWALL AS STORED AS ORC SELECT * FROM FIREWALL_LOGS;
... View more
06-12-2017
01:44 AM
1 Kudo
1) Edge node normally have Hadoop Client installed, using this HDFS
client is responsible for data copy/move to DataNode and Metadata stored
in Namenode 2) HDFS clent act as :- staging/intermediate layer for DN and NM. --
the client contacts the NameNode.TheNameNode inserts the file name into
the file system hierarchy and allocates a data block for it.
TheNameNode responds to the client request with the identity of the DataNode and the destination data block.
3) In turn worker node doesn't have any role to play here. Is my
understanding right? :- No,. The actual task will be done by worker node
only, as it JOB assigned by the Resource Manager. Job Work Flow :- HDFS
Client -> Namenode ->Resource Manager -> Worker/Data Node
->once all MR task completed Datanode will have actual data and Meta
Data Stored Namenode. 4) Normally they separate edge node, master node and data nodes, resource manager node. Edge Node :- Will have batch user id, which is responsible for running the batch. Data Node:- Contain the Physical Data of Hadoop Cluster . Name Node :- will have metadata of Hadoop Cluster. is this help full !
... View more
04-10-2018
02:27 PM
@Bala Vignesh N V Your worker node is same as your data node. Worker node are those who actually does the work in the cluster.
... View more
05-12-2017
05:24 PM
Number of mappers involved in a job= Number of input splits and number of input splits depends on your block size and file size .If file size is 256 mb and block size is 128mb it will involve 2mappers. @Bala Vignesh N V
... View more