Flume Twitter Project Issue CDH5


7 Answer(s)


Please attach or copy+paste your flume config file

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.


# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'TwitterAgent'

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = GRRZiYEHl54xTfeWxFoGYmCMl
TwitterAgent.sources.Twitter.consumerSecret = SNAOn0bz3pJPVl3PFBl1hcCWPER9l0dXYEPwZXuuHbujVu93u8
TwitterAgent.sources.Twitter.accessToken = 2580558602-HOqIMmwrFtxjm7t1jhTcxUIzS52kh6FweUeX8Yp
TwitterAgent.sources.Twitter.accessTokenSecret = UQ0WT16AMgtM7RasxF97ttJ4Wz654cyrdfpaf07RfQbry
TwitterAgent.sources.Twitter.keywords = hadoop, hdfs, bigdata, scientist, soccer

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/cloudera/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

the java exception
"Caused by: java.lang.ClassNotFoundException: com.cloudera.flume.source.TwitterSource
"

indicates that the class com.cloudera.flume.source.TwitterSource is not found.
something is wrong with CLASSPATH variable or corresponding parameter to flume

make sure you include location with that class or corresponding jar file into classpath

thanks igor, i noticed that as well, and the flume_classpath is pointing to the jar file inside the flume-env.sh file.

Its included in the plugins as a reference in flume agent configuration and it also has a --D classpath included in the java additional options as well, but it still doesnt seems to recognize the jar file.

Alright, i guess i resolved it. You need to manually add this classpath to the Java configuration for flume options
-Djava.library.path= "/usr/lib/flume-sources-1.0-SNAPSHOT.jar"
this seems to be the key part, it doesnt matter if you add it to the flume-env.sh file, for me it didnt work until i added it to the java config section.

Here /usr/lib/ is the location for the jar file, also make sure the flume user has rx access to the jar file, if not you will recice permission denied message.

Would be nice to update the LMS for CDH5 with these instructions.

-Kartik

Hi Kartik,
Could you please explain how you did to update the -Djava.library.path ?

Thanks.
Rg,
Adda

First you need to change the hdfs path to TwitterAgent.sinks.HDFS.hdfs.path = namenode/user/flume/tweets..........where namnode could be like master.example.com. Simply giving the path will not work.