I am doing the twitter influencer project.
I have configured flume.
loaded data from twitter into HDFS.
Created table in hive without the partition ,(I have just loaded few mins twitter data, and not using oozie right now for simplicity, so didn't partition the hive table)
I have loaded the data into the hive table.
Now select on the hive table is giving an error.
Following are the details before I faced the error
-------------------------------
ls
flume-sources hive-serdes oozie-workflows
[cloudera@localhost twitter_project]$ cd flume-sources/
[cloudera@localhost flume-sources]$ ls
dependency-reduced-pom.xml flume.conf flume.conf.bkp pom.xml src target
[cloudera@localhost flume-sources]$ flume-ng agent -n TwitterAgent -c conf -f flume.conf
hadoop fs -ls /user/cloudera/tweets/2016/10/16/17/
Found 18 items
-rw-r--r-- 3 cloudera cloudera 16793 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593834
-rw-r--r-- 3 cloudera cloudera 5520 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593835
-rw-r--r-- 3 cloudera cloudera 33873 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593836
-rw-r--r-- 3 cloudera cloudera 15032 2016-10-16 17:53 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593837
-rw-r--r-- 3 cloudera cloudera 4602 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593838
-rw-r--r-- 3 cloudera cloudera 7619 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593839
-rw-r--r-- 3 cloudera cloudera 10952 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593840
-rw-r--r-- 3 cloudera cloudera 24772 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593841
-rw-r--r-- 3 cloudera cloudera 3444 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593842
-rw-r--r-- 3 cloudera cloudera 17275 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593843
-rw-r--r-- 3 cloudera cloudera 14158 2016-10-16 17:54 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593844
-rw-r--r-- 3 cloudera cloudera 14446 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593845
-rw-r--r-- 3 cloudera cloudera 63896 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593846
-rw-r--r-- 3 cloudera cloudera 15910 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593847
-rw-r--r-- 3 cloudera cloudera 5090 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593848
-rw-r--r-- 3 cloudera cloudera 4947 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593849
-rw-r--r-- 3 cloudera cloudera 9247 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593850
-rw-r--r-- 3 cloudera cloudera 14730 2016-10-16 17:55 /user/cloudera/tweets/2016/10/16/17/FlumeData.1476665593851
hive> LOAD DATA INPATH '/user/cloudera/tweets/2016/10/16/17/' INTO TABLE tweets_partitioned;
Loading data to table class7.tweets_partitioned
Table class7.tweets_partitioned stats: [num_partitions: 0, num_files: 0, num_rows: 0, total_size: 0, raw_data_size: 0]
OK
Time taken: 0.515 seconds
hive> select * from tweets_partitioned;
OK
Failed with exception java.io.IOException:java.io.IOException: Not a file: hdfs://localhost.localdomain:8020/user/cloudera/tweets/2016
Time taken: 0.171 seconds
Please can you help me in this.
If I can query the data from hive, I can finish the project.
Hi Subhra,
Check the highligted lines in the output:
hive> LOAD DATA INPATH '/user/cloudera/tweets/2016/10/16/17/' INTO TABLE tweets_partitioned;
Loading data to table class7.tweets_partitioned
Table class7.tweets_partitioned stats: [num_partitions: 0, num_files: 0, num_rows: 0, total_size: 0, raw_data_size: 0]
OK
Time taken: 0.515 seconds
hive> select * from tweets_partitioned;
OK
Failed with exception java.io.IOException:java.io.IOException: Not a file: hdfs://localhost.localdomain:8020/user/cloudera/tweets/2016
Time taken: 0.171 seconds
While writing the file into tweets_partitoned, there were no record written.
Could you please share the complete set of query you used and your flume.conf file you used for fetch tweeter data.
Thanks.
please find attached
flume.conf --> the keys are not the exact ones I am using
data_download.txt --> contains the twitter data downloaded into hdfs
table.txt --> The create table script that I have used
Thanks
I am able to solve it.
The problem occurred because I had created the hive table without a partitioned.
Now I am able to load data as well as query from the table