facing problem in twitter influencer project



0

I am doing the twitter influencer project.

I have configured flume.

loaded data from twitter into HDFS.

Created table in hive without the partition ,(I have just loaded few mins twitter data, and not using oozie right now for simplicity, so didn't partition the hive table)

I have loaded the data into the hive table.

Now select on the hive table is giving an error.

Following are the details before I faced the error

-------------------------------

ls
flume-sources  hive-serdes  oozie-workflows
[cloudera@localhost twitter_project]$ cd flume-sources/
[cloudera@localhost flume-sources]$ ls
dependency-reduced-pom.xml  flume.conf  flume.conf.bkp  pom.xml  src  target
[cloudera@localhost flume-sources]$ flume-ng agent -n TwitterAgent -c conf -f flume.conf

 

 hadoop fs -ls /user/cloudera/tweets/2016/10/16/17/
Found 18 items
-rw-r--r--   3 cloudera cloudera      16793 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593834
-rw-r--r--   3 cloudera cloudera       5520 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593835
-rw-r--r--   3 cloudera cloudera      33873 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593836
-rw-r--r--   3 cloudera cloudera      15032 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593837
-rw-r--r--   3 cloudera cloudera       4602 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593838
-rw-r--r--   3 cloudera cloudera       7619 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593839
-rw-r--r--   3 cloudera cloudera      10952 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593840
-rw-r--r--   3 cloudera cloudera      24772 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593841
-rw-r--r--   3 cloudera cloudera       3444 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593842
-rw-r--r--   3 cloudera cloudera      17275 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593843
-rw-r--r--   3 cloudera cloudera      14158 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593844
-rw-r--r--   3 cloudera cloudera      14446 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593845
-rw-r--r--   3 cloudera cloudera      63896 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593846
-rw-r--r--   3 cloudera cloudera      15910 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593847
-rw-r--r--   3 cloudera cloudera       5090 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593848
-rw-r--r--   3 cloudera cloudera       4947 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593849
-rw-r--r--   3 cloudera cloudera       9247 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593850
-rw-r--r--   3 cloudera cloudera      14730 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593851

 

hive> LOAD DATA INPATH '/user/cloudera/tweets/2016/10/16/17/' INTO TABLE tweets_partitioned;
Loading data to table class7.tweets_partitioned
Table class7.tweets_partitioned stats: [num_partitions: 0, num_files: 0, num_rows: 0, total_size: 0, raw_data_size: 0]
OK
Time taken: 0.515 seconds
 

hive> select * from tweets_partitioned;
OK
Failed with exception java.io.IOException:java.io.IOException: Not a file: hdfs://localhost.localdomain:8020/user/cloudera/tweets/2016
Time taken: 0.171 seconds
 

Please can you help me in this.

If I can query the data from hive, I can finish the project.


5 Answer(s)


0

attaching the contents of my hive-site.xml



0

Hi Subhra,

Check the highligted lines in the output:

hive> LOAD DATA INPATH '/user/cloudera/tweets/2016/10/16/17/' INTO TABLE tweets_partitioned;
Loading data to table class7.tweets_partitioned
Table class7.tweets_partitioned stats: [num_partitions: 0, num_files: 0, num_rows: 0, total_size: 0, raw_data_size: 0]
OK
Time taken: 0.515 seconds

hive> select * from tweets_partitioned;
OK
Failed with exception java.io.IOException:java.io.IOException: Not a file: hdfs://localhost.localdomain:8020/user/cloudera/tweets/2016
Time taken: 0.171 seconds

While writing the file into tweets_partitoned, there were no record written.

Could you please share the complete set of query you used and your flume.conf file you used for fetch tweeter data.

Thanks.


0

please find attached

flume.conf --> the keys are not the exact ones I am using

data_download.txt --> contains the twitter data downloaded into hdfs

table.txt --> The create table script that I have used

 

Thanks



0

I am able to solve it.

The problem occurred because I had created the hive table without a partitioned.

Now I am able to load data as well as query from the table


0

Hi Subhra,

Thanks for your confirmation.

 

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password