Member since
04-22-2016
931
Posts
46
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1498 | 10-11-2018 01:38 AM | |
1867 | 09-26-2018 02:24 AM | |
1826 | 06-29-2018 02:35 PM | |
2417 | 06-29-2018 02:34 PM | |
5361 | 06-20-2018 04:30 PM |
10-24-2016
05:19 PM
and this is how I generate these twitter files (based on internet demos) flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1 -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080
[root@hadoop1 ~]# more twitter-to-hdfs.properties
agent1.sources =source1
agent1.sinks = sink1
agent1.channels = channel1
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource
agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz
agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9
agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov
agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3
agent1.sources.source1.keywords = Clinton Trump
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/flume/tweets
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1.sinks.sink1.hdfs.inUsePrefix = _
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel1.type = file
... View more
10-24-2016
05:16 PM
events1476284674520.zip iam attaching the tweeter file that was created using flume . can you please see if its of valid structure as I am unable to read/view this file .
... View more
10-24-2016
05:05 PM
hi Artem I used your method but I am getting error can you help please? CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
person STRUCT<
screen_name:STRING,
name:STRING,
locations:STRING,
description:STRING,
created_at:STRING,
followers_count:INT,
url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets';
hive>
>
> select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive>
... View more
10-14-2016
09:05 PM
I can successfully build the maven package but when I run the code I am getting the error shown below. Its looking for gson-2.2.2.jar for some reason ,where as the pom file has the version 2.2.6 and the file also exists " /root/.m2/repository/com/google/code/gson/gson/2.6.2/gson-2.6.2.jar" the maven version and the pom.xml files are shown below: [root@hadoop1 hive-json-master]# bin/find-json-schema /tmp/events.1476299830387.log
Can't find /root/.m2/repository/com/google/code/gson/gson/2.2.2/gson-2.2.2.jar. Please build.
[root@hadoop1 hive-json-master]#
[root@hadoop1 hive-json-master]# more pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-json</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Hive-JSON</name>
<url>http://hive.apache.org</url>
<repositories>
<repository>
<id>data-nucleus</id>
<name>data-nucleus</name>
<url>http://www.datanucleus.org/downloads/maven2/</url>
</repository>
</repositories>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.6.2</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.10</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.5.1</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<!-- make a jar with the source code -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.2.1</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- make a jar with the javadoc -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9</version>
<configuration>
<show>public</show>
<!-- only include our source, not the protobuf -->
<sourcepath>${basedir}/src/main/java</sourcepath>
</configuration>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-site-plugin</artifactId>
<version>2.0-beta-6</version>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>findbugs-maven-plugin</artifactId>
<version>2.5.2</version>
<configuration>
<xmlOutput>true</xmlOutput>
<xmlOutputDirectory>target/site</xmlOutputDirectory>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.5</version>
<configuration>
<systemProperties>
<name>test.tmp.dir</name>
<value>${project.build.directory}/test/tmp</value>
<name>test.resources.dir</name>
<value>${basedir}/src/test/resources</value>
</systemProperties>
<argLine>-Xms256m -Xmx512m</argLine>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>org.apache.hadoop.hive.json.JsonSchemaFinder</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>2.4</version>
<configuration>
<dependencyLocationsEnabled>false</dependencyLocationsEnabled>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9</version>
<configuration>
<show>public</show>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>findbugs-maven-plugin</artifactId>
<version>2.5.2</version>
</plugin>
</plugins>
</reporting>
</project>
[root@hadoop1 hive-json-master]#
... View more
Labels:
10-12-2016
09:59 PM
2 Kudos
I fixed this error by using another serde
"org.openx.data.jsonserde.JsonSerDe "
... View more
10-12-2016
08:52 PM
I just created an empty simple table with no structures and its showing the same behavior .what am I missing here? Logging initialized using configuration in file:/etc/hive/2.5.0.0-1245/0/hive-log4j.properties
hive> create external table load_tweets(id BIGINT,text STRING)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> LOCATION '/user/flume/tweets';
OK
Time taken: 1.003 seconds
hive> describe load_tweets;
OK
id bigint from deserializer
text string from deserializer
Time taken: 0.268 seconds, Fetched: 2 row(s)
hive>
> ;
hive> select * from load_tweets;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.ByteArrayInputStream@5b8402e0; line: 1, column: 2]
Time taken: 1.386 seconds
hive>
... View more
10-12-2016
08:42 PM
2 Kudos
I created a hive table as follows : CREATE EXTERNAL TABLE tweetdata(created_at STRING,
text STRING,
person STRUCT<
screen_name:STRING,
name:STRING,
locations:STRING,
description:STRING,
created_at:STRING,
followers_count:INT,
url:STRING>
) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' location '/user/flume/tweets';
but I am getting errors doing selecting iam using HDP2.5 hive>
> describe tweetdata;
OK
created_at string from deserializer
text string from deserializer
person struct<screen_name:string,name:string,locations:string,description:string,created_at:string,followers_count:int,url:string> from deserializer
Time taken: 0.076 seconds, Fetched: 3 row(s)
hive> select person.name, text from tweetdata;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader@4668c5ea; line: 1, column: 2]
Time taken: 0.077 seconds
hive>
... View more
Labels:
09-29-2016
06:45 PM
1 Kudo
I fixed this issue by upgrading from SPARK 1.6.2 to SPARK2.0. I actually upgraded my HDC2.4 cluster to HDC2.5.
... View more
09-29-2016
04:27 PM
SPARK_MAJOR_VERSION was set but the SPARK_HOME wasn't .now its working . thanks for your help
... View more
09-29-2016
04:20 PM
I even tried running the spark-submit from the spark2 directory , still shows me as version 1.6.2 ? [root@hadoop1 ~]# echo $SPARK_MAJOR_VERSION
2
[root@hadoop1 ~]# /usr/hdp/2.5.0.0-1245/spark2/bin/spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/ Type --help for more information.
[root@hadoop1 ~]#
... View more