About JoaquinS

JoaquinS · ‎06-28-2016

Hello, This is my problem, I have a string columns with values that are separated by ';' , and I want to see it as an array using cast. Here is what I want to do: select cast("hello;how;are;you" as ARRAY(separated by ";")); It is possible to do this?, I'm using Impala 2.5 on CDH 5.7. Regards,

JoaquinS · ‎03-02-2016

I had to format all the cluster to solve this. Regard,

JoaquinS · ‎02-25-2016

When I try to install Oozie it give me this error: 2016-02-25 18:22:24,260 INFO org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Overriding configuration with system property. Key [oozie.http.port], Value [11000] 2016-02-25 18:22:24,268 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [oozie.service.ProxyUserService.proxyuser.hue.hosts] 2016-02-25 18:22:24,268 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [oozie.service.GroupsService.hadoop.security.group.mapping] 2016-02-25 18:22:24,269 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [oozie.service.ProxyUserService.proxyuser.hue.groups] 2016-02-25 18:22:24,269 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [hadoop.security.credential.provider.path] 2016-02-25 18:22:24,269 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [oozie.email.from.address] 2016-02-25 18:22:24,269 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [oozie.email.smtp.port] 2016-02-25 18:22:24,269 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [oozie.email.smtp.host] 2016-02-25 18:22:24,270 WARN org.apache.oozie.service.ConfigurationService: SERVER[cloudera1] Invalid configuration defined, [oozie.email.smtp.auth] 2016-02-25 18:22:24,274 WARN org.apache.oozie.service.Services: SERVER[cloudera1] System ID [oozie-oozi] exceeds maximum length [10], trimming 2016-02-25 18:22:24,275 INFO org.apache.oozie.service.Services: SERVER[cloudera1] Exiting null Entering NORMAL 2016-02-25 18:22:24,276 INFO oozieops: SERVER[cloudera1] Exiting null Entering NORMAL 2016-02-25 18:22:24,276 INFO org.apache.oozie.service.Services: SERVER[cloudera1] Initialized runtime directory [/tmp/oozie-oozi6638948414612903101.dir] stdout Thu Feb 25 18:22:22 UTC 2016 JAVA_HOME=/usr/lib/jvm/java-7-oracle-cloudera using 5 as CDH_VERSION Validate DB Connection stderr Error: Could not connect to the database: org.postgresql.util.PSQLException: The connection attempt failed. Stack trace for the error was (for debug purposes): -------------------------------------- java.lang.Exception: Could not connect to the database: org.postgresql.util.PSQLException: The connection attempt failed. at org.apache.oozie.tools.OozieDBCLI.validateConnection(OozieDBCLI.java:905) at org.apache.oozie.tools.OozieDBCLI.createDB(OozieDBCLI.java:185) at org.apache.oozie.tools.OozieDBCLI.run(OozieDBCLI.java:129) at org.apache.oozie.tools.OozieDBCLI.main(OozieDBCLI.java:80) Caused by: org.postgresql.util.PSQLException: The connection attempt failed. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:150) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66) at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125) at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30) at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22) at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:30) at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24) at org.postgresql.Driver.makeConnection(Driver.java:393) at org.postgresql.Driver.connect(Driver.java:267) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:215) at org.apache.oozie.tools.OozieDBCLI.createConnection(OozieDBCLI.java:895) at org.apache.oozie.tools.OozieDBCLI.validateConnection(OozieDBCLI.java:901) ... 3 more Caused by: java.net.UnknownHostException: : at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.<init>(Socket.java:425) at java.net.Socket.<init>(Socket.java:208) at org.postgresql.core.PGStream.<init>(PGStream.java:62) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:76) ... 15 more -------------------------------------- It successfully creates Oozie database but fails creating oozie database tables. I'm using the last version of CDH in Ubuntu 14.04. Regards,

JoaquinS · ‎12-16-2015

I solved the problem. I had to created a java custom interceptor (based in the one you sent me), compile it with maven and paste it in the flume-ng dir. Thanks pdvorak for all the help 🙂

JoaquinS · ‎12-14-2015

Yes I tried that. All the fields are set as headers but the message is transformed by: event.setBody("Message modified by Jsoninterceptor".getBytes()); And it becomes unusefull because I need the log as the original. I tried to change the JsonIntersepter.java file in the .jar using vim but it can't be done, I think that is because the .class file. Also tried to create a java morphline but i can't get it compile correctly. morphlines : [ java { imports : """ import java.util.List; import java.util.Map; import org.apache.flume.Context; import org.apache.flume.Event; import org.apache.flume.interceptor.Interceptor; import org.apache.log4j.Logger; """ code: """ Map<String, String> headers = event.getHeaders(); // example: add / remove headers if (headers.containsKey("product")) { headers.put("product", headers.get("product")); } if (headers.containKey("client")){ headers.put("client", headers.get("client")); } return event; """ } ] Regards,

JoaquinS · ‎12-14-2015

Hello, I created a java file with the custom interceptor, but I don't know how to compile it or transform it to a jar file properly. I tested the javac and the jar command, put the interceptor builder is not found.

JoaquinS · ‎12-11-2015

Problem solved. Instead of using: flume1.sources.kafka-source-1.interceptors.i1.serializers.ser1.type = default changed it for: flume1.sources.kafka-source-1.interceptors.i1.serializers.ser1.type = org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer And it worked fine. I have two more questions: 1) the - (hyphen) cannot be readed as part of a header, so if the value of the header has - its goes to the default and not to the corresponding mapper. 2) I wanna add a second regex but how can I mapp two headers together, for example: flume1.sources.kafka-source-1.selector.header = header1 header2 flume1.sources.kafka-source-1.selector.mapping.(value1)&(value2) = hdfs-channel-x It is possible by doing it without programing it? Because im not a programer. Regards,

JoaquinS · ‎12-11-2015

I changed the regex but still not working. the whole config file is this: # Sources, channels, and sinks are defined per # agent name, in this case flume1. flume1.sources = kafka-source-1 flume1.channels = hdfs-channel-1 hdfs-channel-2 hdfs-channel-3 hdfs-channel-4 hdfs-channel-5 hdfs-channel-6 hdfs-channel-7 logChannel flume1.sinks = hdfs-sink-1 hdfs-sink-2 hdfs-sink-3 hdfs-sink-4 hdfs-sink-5 hdfs-sink-6 hdfs-sink-7 logSink # For each source, channel, and sink, set # standard properties. flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource flume1.sources.kafka-source-1.zookeeperConnect = 192.168.70.23:2181 flume1.sources.kafka-source-1.topic = kafkatopic flume1.sources.kafka-source-1.batchSize = 1000 flume1.sources.kafka-source-1.channels = hdfs-channel-1 hdfs-channel-2 hdfs-channel-3 hdfs-channel-4 hdfs-channel-5 hdfs-channel-6 hdfs-channel-7 logChannel flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1 flume1.sinks.hdfs-sink-2.channel = hdfs-channel-2 flume1.sinks.hdfs-sink-3.channel = hdfs-channel-3 flume1.sinks.hdfs-sink-4.channel = hdfs-channel-4 flume1.sinks.hdfs-sink-5.channel = hdfs-channel-5 flume1.sinks.hdfs-sink-6.channel = hdfs-channel-6 flume1.sinks.hdfs-sink-7.channel = hdfs-channel-7 flume1.sinks.logSink.channel = logChannel flume1.channels.hdfs-channel-1.type = memory flume1.channels.hdfs-channel-2.type = memory flume1.channels.hdfs-channel-3.type = memory flume1.channels.hdfs-channel-4.type = memory flume1.channels.hdfs-channel-5.type = memory flume1.channels.hdfs-channel-6.type = memory flume1.channels.hdfs-channel-7.type = memory flume1.channels.logChannel.type = memory flume1.channels.hdfs-channel-1.capacity = 10000 flume1.channels.hdfs-channel-1.transactionCapacity = 1000 flume1.channels.hdfs-channel-2.capacity = 10000 flume1.channels.hdfs-channel-2.transactionCapacity = 1000 flume1.channels.hdfs-channel-3.capacity = 10000 flume1.channels.hdfs-channel-3.transactionCapacity = 1000 flume1.channels.hdfs-channel-4.capacity = 10000 flume1.channels.hdfs-channel-4.transactionCapacity = 1000 flume1.channels.hdfs-channel-5.capacity = 10000 flume1.channels.hdfs-channel-5.transactionCapacity = 1000 flume1.channels.hdfs-channel-6.capacity = 10000 flume1.channels.hdfs-channel-6.transactionCapacity = 1000 flume1.channels.hdfs-channel-7.capacity = 10000 flume1.channels.hdfs-channel-7.transactionCapacity = 1000 flume1.channels.logChannel.capacity = 10000 flume1.channels.logChannel.transactionCapacity = 1000 #Interceptors setup flume1.sources.kafka-source-1.interceptors = i1 flume1.sources.kafka-source-1.interceptors.i1.type = regex_extractor flume1.sources.kafka-source-1.interceptors.i1.regex = "product":"(\\w+)" flume1.sources.kafka-source-1.interceptors.i1.serializers = ser1 flume1.sources.kafka-source-1.interceptors.i1.serializers.ser1.type = default flume1.sources.kafka-source-1.interceptors.i1.serializers.ser1.name = product #checkpoint,smgsyslog, sepsyslog, pgp, bluecoat-syslog,bluecoat # channel selector configuration flume1.sources.kafka-source-1.selector.type = multiplexing flume1.sources.kafka-source-1.selector.header = product flume1.sources.kafka-source-1.selector.mapping.ckeckpoint = hdfs-channel-1 flume1.sources.kafka-source-1.selector.mapping.smgsyslog = hdfs-channel-2 flume1.sources.kafka-source-1.selector.mapping.sepsyslog = hdfs-channel-3 flume1.sources.kafka-source-1.selector.mapping.pgp = hdfs-channel-4 flume1.sources.kafka-source-1.selector.mapping.bluecoat-syslog = hdfs-channel-5 flume1.sources.kafka-source-1.selector.mapping.bluecoat = hdfs-channel-6 flume1.sources.kafka-source-1.selector.default = hdfs-channel-7 logChannel # sinks configuration flume1.sinks.hdfs-sink-1.type = hdfs flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream flume1.sinks.hdfs-sink-1.hdfs.filePrefix = test-events flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true flume1.sinks.hdfs-sink-1.hdfs.path = /user/root/logs/checkpoint flume1.sinks.hdfs-sink-1.hdfs.rollCount=1000 flume1.sinks.hdfs-sink-1.hdfs.rollSize=0 flume1.sinks.hdfs-sink-2.type = hdfs flume1.sinks.hdfs-sink-2.hdfs.writeFormat = Text flume1.sinks.hdfs-sink-2.hdfs.fileType = DataStream flume1.sinks.hdfs-sink-2.hdfs.filePrefix = test-events flume1.sinks.hdfs-sink-2.hdfs.useLocalTimeStamp = true flume1.sinks.hdfs-sink-2.hdfs.path = /user/root/logs/smgsyslog flume1.sinks.hdfs-sink-2.hdfs.rollCount=1000 flume1.sinks.hdfs-sink-2.hdfs.rollSize=0 flume1.sinks.hdfs-sink-3.type = hdfs flume1.sinks.hdfs-sink-3.hdfs.writeFormat = Text flume1.sinks.hdfs-sink-3.hdfs.fileType = DataStream flume1.sinks.hdfs-sink-3.hdfs.filePrefix = test-events flume1.sinks.hdfs-sink-3.hdfs.useLocalTimeStamp = true flume1.sinks.hdfs-sink-3.hdfs.path = /user/root/logs/sepsyslog flume1.sinks.hdfs-sink-3.hdfs.rollCount=1000 flume1.sinks.hdfs-sink-3.hdfs.rollSize=0 flume1.sinks.hdfs-sink-4.type = hdfs flume1.sinks.hdfs-sink-4.hdfs.writeFormat = Text flume1.sinks.hdfs-sink-4.hdfs.fileType = DataStream flume1.sinks.hdfs-sink-4.hdfs.filePrefix = test-events flume1.sinks.hdfs-sink-4.hdfs.useLocalTimeStamp = true flume1.sinks.hdfs-sink-4.hdfs.path = /user/root/logs/pgp flume1.sinks.hdfs-sink-4.hdfs.rollCount=1000 flume1.sinks.hdfs-sink-4.hdfs.rollSize=0 flume1.sinks.hdfs-sink-5.type = hdfs flume1.sinks.hdfs-sink-5.hdfs.writeFormat = Text flume1.sinks.hdfs-sink-5.hdfs.fileType = DataStream flume1.sinks.hdfs-sink-5.hdfs.filePrefix = test-events flume1.sinks.hdfs-sink-5.hdfs.useLocalTimeStamp = true flume1.sinks.hdfs-sink-5.hdfs.path = /user/root/logs/bluecoatsyslog flume1.sinks.hdfs-sink-5.hdfs.rollCount=1000 flume1.sinks.hdfs-sink-5.hdfs.rollSize=0 flume1.sinks.hdfs-sink-6.type = hdfs flume1.sinks.hdfs-sink-6.hdfs.writeFormat = Text flume1.sinks.hdfs-sink-6.hdfs.fileType = DataStream flume1.sinks.hdfs-sink-6.hdfs.filePrefix = test-events flume1.sinks.hdfs-sink-6.hdfs.useLocalTimeStamp = true flume1.sinks.hdfs-sink-6.hdfs.path = /user/root/logs/bluecoat flume1.sinks.hdfs-sink-6.hdfs.rollCount=1000 flume1.sinks.hdfs-sink-6.hdfs.rollSize=0 flume1.sinks.hdfs-sink-7.type = hdfs flume1.sinks.hdfs-sink-7.hdfs.writeFormat = Text flume1.sinks.hdfs-sink-7.hdfs.fileType = DataStream flume1.sinks.hdfs-sink-7.hdfs.filePrefix = test-events flume1.sinks.hdfs-sink-7.hdfs.useLocalTimeStamp = true flume1.sinks.hdfs-sink-7.hdfs.path = /user/root/logs/otros flume1.sinks.hdfs-sink-7.hdfs.rollCount=1000 flume1.sinks.hdfs-sink-7.hdfs.rollSize=0 flume1.sinks.logSink.type = logger # Other properties are specific to each type of # source, channel, or sink. In this case, we # specify the capacity of the memory channel. I think that is somthing wrong with the channels but i dont know what is the problem. The logger output without the interceptor part has two headers, timestamp and topic .

JoaquinS · ‎12-10-2015

I added a interceptor that finds the field product in the log and creates a header with it. This is the code, and is not working. What could be wong? #Interceptors setup flume1.sources.kafka-source-1.interceptors = i1 flume1.sources.kafka-source-1.interceptors.i1.type = regex_extractor flume1.sources.kafka-source-1.interceptors.i1.regex = "product":"(\\d+)" flume1.sources.kafka-source-1.interceptors.i1.serializers = ser1 flume1.sources.kafka-source-1.interceptors.i1.serializers.ser1.type = default flume1.sources.kafka-source-1.interceptors.i1.serializers.ser1.name = product the field product in the log is like this ...,"product":"smgsyslog",...

JoaquinS · ‎12-10-2015

This is the result: 2015-12-10 09:38:59,065 INFO org.apache.solr.servlet.SolrDispatchFilter: [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=0 The headers are status and Qtime? and if they are, how can I make that a field of a log is read as a header?.

Online	Offline
Last Visited	‎06-21-2019 10:30 AM

Member Since	‎11-12-2015 10:40 AM
Last Visited	‎06-21-2019 10:30 AM
Posts	90
Kudos received	1

Cloudera Community

Re: Problem removing Spark 1.6 from my cluster

Re: Cannot read parquet files

Re: Kerberos, no supported encryption types error

Re: Failed to install Oozie

Re: Flafka selector doesn't work

String to array

Re: Failed to install Oozie

Failed to install Oozie

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work

Re: Flafka selector doesn't work