Member since
05-16-2016
785
Posts
114
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1866 | 06-12-2019 09:27 AM | |
3070 | 05-27-2019 08:29 AM | |
5109 | 05-27-2018 08:49 AM | |
4480 | 05-05-2018 10:47 PM | |
2782 | 05-05-2018 07:32 AM |
02-23-2017
06:09 AM
Just put it under the user directory and set the permission just like you we do Linux fs . Using hadoop fs shell command. hadoop fs -chown Usage: hadoop fs -chmod In addition for backup We can configure HDFS Snapshots point in time file recovery . https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html
... View more
02-22-2017
07:47 PM
Below are the prerequisite for your requirements. 1 . We need a timestamp interceptor to inject the timestamp to every header. if you dont have one already in your flume -conf .properties. for example tail1.sources.src1.interceptors = ic1
tail1.sources.src1.interceptors.ic1.type = timestamp 2 . if its multi tier flume agent architecture it is recommended to use hdfs.UseLocalTimestamp that will use a timestamp generated by the flume agent runining the hdfs sink. tail1.sinks.sink1.useLocalTimeStamp = true 3. To make all the files thats gets generated in month to put in a same month folder all we have to do is to you use below config - justs month and year tail1.sinks.sink1.hdfs.path = flume/collector1/%m-%Y
... View more
02-22-2017
03:24 AM
I am letting me my thoughts. Please correct me if I am wrong in understanding your issue. 1. You can use the below when you are creating table in impala . STORED AS PARQUET 2 . For example, with a school_records table partitioned on a year column, there is a separate data directory for each different year value, and all the data for that year is stored in a data file in that directory. A query that includes a WHERE condition such as YEAR=1966, YEAR IN (1989,1999), can examine only the data files from the appropriate directory " quoted from Cloudera Impala knoweldge base" 3. Would you consider writing a custom interceptor to add the field in the event header or you could you UUID interceptor for unique id , second option but i am not sure you could pull the data from hdfs and run a python script to add a new field.
... View more
02-21-2017
10:20 PM
But now I need to store them into HDFS as "partition" structure shown below. I have been told this is required in order to let Impala efectively read the data. /hive/warehouse/test/fact_my_service/year=2017/month=2/day=21 Answer : You can use hdfs sink - escape sequence like a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d %d day of month (01)
%m month (01..12)
%Y year (2010) refere the apache flume api - hdfs sink https://flume.apache.org/FlumeUserGuide.html Can you please share some hints how to get my data stored that way and how to make Impala to understand already stored data? if you have a common shared metastore for hive and impala you can create external table and point the location CREATE EXTERNAL TABLE table_ name
> (userid INT, movid STRING, age TINYINT)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LOCATION ' hive warehouse or any location of your data ' Note - Make sure to perform INVALIDATE METADATA; Since we have created the table outside we have to refersh hive metastore in impala to query .
... View more
02-20-2017
09:33 PM
1 Kudo
Hive is designed for schema on Read. Meaning Hive has not control over the underlying storage . You can damage the data and still managed to query using hive . Let say if the schema does not match the file contents then Hive will try its best to read it. going down fruther it will produce null values if its non numberic strings. Where as in traditional database you write update insert and the database has control over the storage it will enforce the schema while writing thats why it is schema on write. So to sum up you wont be ablecreate Not Null constraints hive table and enforce it by design .
... View more
02-20-2017
09:06 AM
Could you run the below commands and post the results I am curious , whats your replication factor ? hadoop fsck path to directory
hadoop fs -du -s path to directory The above commands should give us the same results. Both only caclulates hdfs raw data without considering the replication factor. The below will calculate the file size across the nodes( hard disk ) and replication factor . hadoop fs -count -q /path/to/directory we can compare the results pertain to how much HDFS space has been consumed and run against Namenode UI results .
... View more
02-17-2017
08:11 PM
I believe you have Configured a seperate host that act as a proxy ,making it to handle the request along with kerberos . Hence I think you wont be able to by pass the proxy because it works like a session facade https://www.cloudera.com/documentation/enterprise/5-2-x/topics/impala_proxy.html#proxy_kerberos
... View more
02-16-2017
10:19 PM
I believe you are missing the realm rule in the core-site.xml please check the tag in the core-site.xml hadoop.registry.kerberos.realm Took this reference from hadoop.apaches core-site.xml api hadoop.registry.kerberos.realm
The kerberos realm: used to set the realm of system principals which do not declare their realm, and any other accounts that need the value. If empty, the default realm of the running process is used. If neither are known and the realm is needed, then the registry service/client will fail
... View more
02-14-2017
09:17 PM
You mean the host ip that impala is running or the port could you tell me
... View more
02-13-2017
07:45 PM
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: 0.0.0.0/0.0.0.0:8022 From the above error it is clear that the external datanode is having trouble connecting to Namenode. You can do one thing. Check the status of the namenode you are connect by Sudo Service hadoop-hdfs-namenode status Sudo Service hadoop-hdfs-secondarynamenode status if it has not started then you may start it by replacing the status with start if you dont have authorization you should contact the hadoop admin . Also please check the same for Secondarynamenode. Thanks
... View more