Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Unable to read json file using elephant-bird,please help

Expert Contributor

Trying to load the json file which is having null values in it by using elephant-bird JsonLoader.

sample.json

{"created_at":"Mon Aug 22 10:48:23 +0000 2016","id":767674772662607873,"id_str":"767674772662607873","text":"KPIT Image Result for https:\/\/t.co\/Nas2ZnF1zZ... https:\/\/t.co\/9TnelwtIvm","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":123,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/Nas2ZnF1zZ","expanded_url":"http:\/\/miltonious.com\/","display_url":"miltonious.com","indices":[24,47]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1471862903167"}

script:

REGISTER piggybank.jar
REGISTER json-simple-1.1.1.jar
REGISTER elephant-bird-pig-4.3.jar
REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.3.jar

json = LOAD 'sample.json' USING JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');
describe json; dump json;

When I dump json,I am getting the following output and the worning

(Mon Aug 22 10:48:23 +0000 2016,767674772662607873,767674772662607873,google Image Result for Twitter Web Client,false,1234,12345,3214,43215,,,,,,,,,,,,,,)

WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {complete json}

By warning i guess it is getting NULL values. So how can we load a Json which is having null values in it.

And I have tried in another way i.e

json = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');

describe json;

Output

Schema for json unknown.

Please suggest me.

Thanks.

1 ACCEPTED SOLUTION

Expert Contributor

thanks for your reply Artem Ervits.

I think it is because of the difference versions that i have used in my script.

When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys.

script:-

REGISTER elephant-bird-core-4.1.jar 
REGISTER elephant-bird-hadoop-compat-4.1.jar 
REGISTER elephant-bird-pig-4.1.jar 
REGISTER json-simple-1.1.1.jar

twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();

extracted = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place;

dump extracted;

And it worked fine.

View solution in original post

2 REPLIES 2

Mentor

@Mohan V took your sample, ran this on HDP 2.5 Sandbox so using Pig 0.16 rather than 0.15 but otherwise everything else is the same. I also renamed alias json to data.

[guest@sandbox ~]$ cat sample.json
{"created_at":"Mon Aug 22 10:48:23 +0000 2016","id":767674772662607873,"id_str":"767674772662607873","text":"KPIT Image Result for https:\/\/t.co\/Nas2ZnF1zZ... https:\/\/t.co\/9TnelwtIvm","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":123,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/Nas2ZnF1zZ","expanded_url":"http:\/\/miltonious.com\/","display_url":"miltonious.com","indices":[24,47]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1471862903167"}
[guest@sandbox ~]$ hdfs dfs -put sample.json .

used the jars provided by @gkeys in your other question, the only difference is you're mixing 4.3 with 4.1 versions, perhaps that's the issue

REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-pig-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.1.jar
REGISTER json-simple-1.1.1.jar

data = LOAD 'sample.json' USING JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');

describe data;
dump data;

executing with tez

[guest@sandbox ~]$ pig -x tez mohan.pig
WARNING: Use "yarn jar" to launch YARN applications.
16/09/11 22:16:12 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/09/11 22:16:12 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/09/11 22:16:12 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
16/09/11 22:16:12 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
16/09/11 22:16:12 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
2016-09-11 22:16:13,002 [main] INFO  org.apache.pig.Main - Apache Pig version 0.16.0.2.5.0.0-817 (rUnversioned directory) compiled Jun 26 2016, 11:34:45
2016-09-11 22:16:13,003 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/guest/pig_1473632173001.log
2016-09-11 22:16:13,813 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/guest/.pigbootup not found
2016-09-11 22:16:13,940 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
2016-09-11 22:16:14,389 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-mohan.pig-6b630799-b287-476a-ac2f-ea19ee7d25ae
2016-09-11 22:16:14,761 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-09-11 22:16:14,862 [main] INFO  org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
2016-09-11 22:16:15,238 [main] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 698875904 to monitor. collectionUsageThreshold = 489213120, usageThreshold = 489213120
data: {created_at: chararray,id: chararray,id_str: chararray,text: chararray,source: chararray,in_reply_to_status_id: chararray,in_reply_to_status_id_str: chararray,in_reply_to_user_id: chararray,in_reply_to_user_id_str: chararray,in_reply_to_screen_name: chararray,geo: chararray,coordinates: chararray,place: chararray,contributors: chararray,is_quote_status: bytearray,retweet_count: long,favorite_count: chararray,entities: map[],favorited: bytearray,retweeted: bytearray,possibly_sensitive: bytearray,lang: chararray}
2016-09-11 22:16:15,463 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2016-09-11 22:16:15,511 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-09-11 22:16:15,572 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-09-11 22:16:15,696 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/guest/staging and resources directory is /tmp/temp1880338694
2016-09-11 22:16:15,743 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false
2016-09-11 22:16:15,843 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-09-11 22:16:15,885 [main] INFO  com.hadoop.compression.lzo.GPLNativeCodeLoader - Loaded native gpl library
2016-09-11 22:16:15,889 [main] INFO  com.hadoop.compression.lzo.LzoCodec - Successfully loaded & initialized native-lzo library [hadoop-lzo rev 7a4b57bedce694048432dd5bf5b90a6c8ccdba80]
2016-09-11 22:16:15,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: elephant-bird-core-4.1.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: jackson-core-asl-1.9.13.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: pig-0.16.0.2.5.0.0-817-core-h2.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: elephant-bird-pig-4.1.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: antlr-runtime-3.4.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: automaton-1.11-8.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: elephant-bird-hadoop-compat-4.1.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: json-simple-1.1.1.jar
2016-09-11 22:16:16,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: joda-time-2.8.1.jar
2016-09-11 22:16:16,513 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-2: parallelism=1, memory=256, java opts=-Xmx256m
2016-09-11 22:16:16,513 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: data
2016-09-11 22:16:16,513 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: data[6,7]
2016-09-11 22:16:16,513 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex:
2016-09-11 22:16:16,597 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Total estimated parallelism is 1
2016-09-11 22:16:16,688 [PigTezLauncher-0] INFO  org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2016-09-11 22:16:16,718 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0.2.5.0.0-817, revision=85dd709e66a077055a1749469af62f4d1f3818ed, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20160623-1449 ]
2016-09-11 22:16:16,907 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-09-11 22:16:16,913 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
2016-09-11 22:16:17,017 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Using org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager to manage Timeline ACLs
2016-09-11 22:16:17,111 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-09-11 22:16:17,117 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Session mode. Starting session.
2016-09-11 22:16:17,120 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.5.0.0-817/tez/tez.tar.gz
2016-09-11 22:16:17,180 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Tez system stage directory hdfs://sandbox.hortonworks.com:8020/tmp/guest/staging/.tez/application_1473630550492_0001 doesn't exist and is created
2016-09-11 22:16:17,212 [PigTezLauncher-0] INFO  org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager - Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1473630550492_0001
2016-09-11 22:16:17,580 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1473630550492_0001
2016-09-11 22:16:17,583 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - The url to track the Tez Session: http://sandbox.hortonworks.com:8088/proxy/application_1473630550492_0001/
2016-09-11 22:16:23,232 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:mohan.pig-0_scope-0
2016-09-11 22:16:23,232 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:mohan.pig, applicationId=application_1473630550492_0001, dagName=PigLatin:mohan.pig-0_scope-0, callerContext={ context=PIG, callerType=PIG_SCRIPT_ID, callerId=PIG-mohan.pig-6b630799-b287-476a-ac2f-ea19ee7d25ae }
2016-09-11 22:16:23,600 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:mohan.pig, applicationId=application_1473630550492_0001, dagName=PigLatin:mohan.pig-0_scope-0
2016-09-11 22:16:23,793 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-09-11 22:16:23,793 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
2016-09-11 22:16:23,801 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:mohan.pig-0_scope-0. Application id: application_1473630550492_0001
2016-09-11 22:16:24,640 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1473630550492_0001
2016-09-11 22:16:24,802 [Timer-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null
2016-09-11 22:16:28,939 [PigTezLauncher-0] INFO  org.apache.tez.common.counters.Limits - Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=3000, COUNTER_NAME_MAX=64, MAX_COUNTERS=10000
2016-09-11 22:16:28,944 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=SUCCEEDED, progress=TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=Counters: 21
	org.apache.tez.common.counters.DAGCounter
		NUM_SUCCEEDED_TASKS=1
		TOTAL_LAUNCHED_TASKS=1
		DATA_LOCAL_TASKS=1
		AM_CPU_MILLISECONDS=910
		AM_GC_TIME_MILLIS=11
	File System Counters
		HDFS_BYTES_READ=911
		HDFS_BYTES_WRITTEN=253
		HDFS_READ_OPS=4
		HDFS_LARGE_READ_OPS=0
		HDFS_WRITE_OPS=2
	org.apache.tez.common.counters.TaskCounter
		GC_TIME_MILLIS=64
		CPU_MILLISECONDS=2690
		PHYSICAL_MEMORY_BYTES=170393600
		VIRTUAL_MEMORY_BYTES=990400512
		COMMITTED_HEAP_BYTES=170393600
		INPUT_RECORDS_PROCESSED=1
		OUTPUT_RECORDS=1
	TaskCounter_scope_2_INPUT_scope_0
		INPUT_RECORDS_PROCESSED=1
	TaskCounter_scope_2_OUTPUT_scope_1
		OUTPUT_RECORDS=1
	org.apache.pig.PigWarning
		UDF_WARNING_1=1
	org.apache.pig.builtin.JsonLoader
		UDF_WARNING_1=1
2016-09-11 22:16:29,650 [main] WARN  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Encountered Warning UDF_WARNING_1 1 time(s).
2016-09-11 22:16:29,655 [main] INFO  org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:


       HadoopVersion: 2.7.1.2.5.0.0-817
          PigVersion: 0.16.0.2.5.0.0-817
          TezVersion: 0.7.0.2.5.0.0-817
              UserId: guest
            FileName: mohan.pig
           StartedAt: 2016-09-11 22:16:15
          FinishedAt: 2016-09-11 22:16:29
            Features: UNKNOWN


Success!




DAG 0:
                                    Name: PigLatin:mohan.pig-0_scope-0
                           ApplicationId: job_1473630550492_0001
                      TotalLaunchedTasks: 1
                           FileBytesRead: 0
                        FileBytesWritten: 0
                           HdfsBytesRead: 911
                        HdfsBytesWritten: 253
      SpillableMemoryManager spill count: 0
                Bags proactively spilled: 0
             Records proactively spilled: 0


DAG Plan:
Tez vertex scope-2


Vertex Stats:
VertexId Parallelism TotalTasks   InputRecords   ReduceInputRecords  OutputRecords  FileBytesRead FileBytesWritten  HdfsBytesRead HdfsBytesWritten Alias	Feature	Outputs
scope-2            1          1              1                    0              1              0                0            911              253 data		hdfs://sandbox.hortonworks.com:8020/tmp/temp1943556042/tmp-1378487105,


Input(s):
Successfully read 1 records (911 bytes) from: "hdfs://sandbox.hortonworks.com:8020/user/guest/sample.json"


Output(s):
Successfully stored 1 records (253 bytes) in: "hdfs://sandbox.hortonworks.com:8020/tmp/temp1943556042/tmp-1378487105"


2016-09-11 22:16:29,659 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-09-11 22:16:29,673 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-09-11 22:16:29,673 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(Mon Aug 22 10:48:23 +0000 2016,767674772662607873,767674772662607873,KPIT Image Result for https://t.co/Nas2ZnF1zZ. https://t.co/9TnelwtIvm,<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>,false,123,,,,,,,,,,,,,,,)
2016-09-11 22:16:29,782 [main] INFO  org.apache.pig.Main - Pig script completed in 16 seconds and 908 milliseconds (16908 ms)
2016-09-11 22:16:29,789 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Shutting down thread pool
2016-09-11 22:16:29,804 [Thread-34] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Shutting down Tez session org.apache.tez.client.TezClient@5df6b6ad
2016-09-11 22:16:30,214 [Thread-34] INFO  org.apache.tez.client.TezClient - Shutting down Tez Session, sessionName=PigLatin:mohan.pig, applicationId=application_1473630550492_0001

Expert Contributor

thanks for your reply Artem Ervits.

I think it is because of the difference versions that i have used in my script.

When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys.

script:-

REGISTER elephant-bird-core-4.1.jar 
REGISTER elephant-bird-hadoop-compat-4.1.jar 
REGISTER elephant-bird-pig-4.1.jar 
REGISTER json-simple-1.1.1.jar

twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();

extracted = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place;

dump extracted;

And it worked fine.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.