Running the Cloudera twitter example



0
I'm trying to run the twitter demo on the Cloudera setup. I can get the partitioned data using flume. When I try to access the table using Hive, however, I get a "NullPointerException" (full log below). Has anyone else run into this? I tried using the exact .jar files that came with the Dezyre example and also followed the instructions in the Cloudera readme file to install mySQL, but neither of those seemed to help. I also confirmed the partition values match the subdirectory names created in HDFS. If anyone else ran into something similar (or knows of any additional install steps needed w/ the Cloudera VM) please let me know! Full hive log follow:

hive> ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;
Added [/usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar]
hive> CREATE EXTERNAL TABLE tweets (
> id BIGINT,
> created_at STRING,
> source STRING,
> favorited BOOLEAN,
> retweeted_status STRUCT<
> text:STRING,
> user:STRUCT,
> retweet_count:INT>,
> entities STRUCT<
> urls:ARRAY>,
> user_mentions:ARRAY>,
> hashtags:ARRAY>>,
> text STRING,
> user STRUCT<
> screen_name:STRING,
> name:STRING,
> friends_count:INT,
> followers_count:INT,
> statuses_count:INT,
> verified:BOOLEAN,
> utc_offset:INT,
> time_zone:STRING>,
> in_reply_to_screen_name STRING
> )
> PARTITIONED BY (year INT, month INT, dt INT, dthour INT)
> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
> LOCATION '/twitter';
OK
Time taken: 0.155 seconds
hive> ALTER TABLE tweets ADD PARTITION (year=2015, month=10, dt=21, dthour=03);
OK
Time taken: 0.441 seconds
hive> SELECT * FROM tweets limit 5;
FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception nulljava.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
at org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:578)
at org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
at org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:453)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

hive>

3 Answer(s)


0

Hi,
This is the bug name serde. But if you are using 0.13 or later version of Hive. This won't be a problem.
You are missing any quote in command. Please check the command and run again.

0

Abhijit,

Thank you for your response. I checked and I am using hive version 1.1 (based on the version of the hive-hwi jar file in /usr/lib/hive/lib). I also double-checked that there are no missing quotes in the commands (the exact commands I used can be seen above). I did see that there was a "hive-serdes-1.1.0-cdh5.4.2.jar" file in /usr/lib/hive/lib. I tried using that but got the same result. Finally, I tried putting quotes around the jar name for the ADD JAR command but that resulted in a "does not exist" error message. I tried checking Cloudera's site. The only issue I saw there was a need to update the Twitter version in the pom.xml file for the jar file but that had already been taken care of by Cloudera. Do you have any other thoughts on what the issue could be? Thanks!

0

Hi Robert,
After the searching, I found that this bug is removed in hive-1.2.0 version. You can upgrade your hive to that version to get ride of that bug.

Below is the link of the cloudera explain how to upgrade hive.
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_hive_upgrade.html

Hope this helps.
Sorry for inconvenience.
Thanks.

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password