Sqoop configuration problem



0
I encountered a problem while trying to import an existing tables from MySQL to HDFS. The command I used is the following:

sqoop import --connect jdbc:mysql://localhost/DeZyre --table player -username root -P --target-dir /user/hduser/sqoopOut3 -m 1

and this is what I got (sorry I don't see a way to attache a screenshot)

Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Enter password:
14/07/07 20:23:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/07/07 20:23:15 INFO tool.CodeGenTool: Beginning code generation
14/07/07 20:23:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `player` AS t LIMIT 1
14/07/07 20:23:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `player` AS t LIMIT 1
14/07/07 20:23:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hduser/compile/acd15f3e515c5e6961cf66340ca9d790/player.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/07/07 20:23:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/acd15f3e515c5e6961cf66340ca9d790/player.jar
14/07/07 20:23:17 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/07/07 20:23:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/07/07 20:23:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/07/07 20:23:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/07/07 20:23:17 INFO mapreduce.ImportJobBase: Beginning import of player
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/07/07 20:23:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/07/07 20:23:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/07/07 20:23:18 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/07/07 20:23:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/07/07 20:23:18 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser1836752559/.staging/job_local1836752559_0001
14/07/07 20:23:18 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/sqoop/sqoop-1.4.4.jar
14/07/07 20:23:18 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/sqoop/sqoop-1.4.4.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:600)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
-----------------------------------
I did make sure to define the appropriate paths/variables in .bashrc file, and also I modified the swoop-env.sh file in the following way:

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=$HADOOP_HOME

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=$HADOOP_HOME

#set the path to where bin/hbase is available
export HBASE_HOME=$HBASE_HOME

#Set the path to where bin/hive is available
export HIVE_HOME=$HIVE_HOME

----

All variables names are correct and refer to correct paths. I am using Apache Hadoop installed on an Ubuntu machine. It would be great if you could provide some manuals on how to set the environment variables for Apache distribution instead of having it for only Cloudera distribution. At least it could say which files need to be modified and give general intuition for the required modifications.

Thanks!

13 Answer(s)


0

Hi Seda,


"PriviledgedActionException as:hduser (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/sqoop/sqoop-1.4.4.jar
14/07/07 20:23:18 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/sqoop/sqoop-1.4.4.jar"

From the first look I can see that the file is being searched on HDFS which should not be the case. It should look for the client on the local machine. It is common for most services to look for jar files in HDFS though. You need to ensure that the environment variables are properly set up.

Set the $SQOOP_HOME and add it to the $PATH.
export HADOOP_MAPRED_HOME=/usr/local/hadoop-0.20-mapreduce
export HADOOP_HOME=/usr/local/hadoop-0.20-mapreduce

Run the above scripts based on the install location. Once try it from the command line and check if you can run it without errors. Once it successfully runs you can add it to the /.bashrc file. Hope this helps! Let me know if you need any further assistance.

Thanks
Deb.

0

Actually as I mentioned my variables are set correctly. I am trying command line - that is actually the only way I connect to that machine. I know that something is not set correctly but everything you suggested is already set correctly in my .bashrc file. I am not sure though why it is looking for that file on hdfs ...

0

Ok I fixed my problem just putting the appropriate files into hdfs. I couldn't have it working otherwise.

0

could you tell me how to fix this problem?i have meet same problem , but i cann't solve it.

0

have you fixed the problem? I got the exact same problem.

0

if you have fixed the problem . please do tell. Thanks

0

Hi Guys,

Sorry for a late response. Yes I did fix the problem by putting the jar file that it was looking for (sqoop-1.4.4.jar) on to hdfs - following exactly the same path that was in the error. In my case the error was "File does not exist: hdfs://localhost:9000/usr/local/sqoop/sqoop-1.4.4.jar"

I tried to find a different solution to this but I failed. The only way it worked is when I put the file it was looking for to the directory mentioned in the error message.

I still was not able to figure out why it was looking for this file on hdfs. All my path variables were set correctly.

Hope this helps!

Best,
Seda

0

Seda,
how did you copy the jar file to hdfs?
Thanks,
dave

0

Hello all,

Can we use any other schema other than the default schema while importing,
it seems the current client jar does not support "currentschema" functionality.
How to specify table name with the schema?

Any pointers would be appreciated.

Regards
Kunal saxena

0

where u copied this jar ? i am also getting same kind of error

0

I had same issue and I had to do this work around as well, I am using hadoop 2.5.1 and sqoop-1.4.5.bin__hadoop-0.23.

bin/hdfs dfs -mkdir -p /home/hassan/hadoop/sqoop-1.4.5.bin__hadoop-0.23/lib
bin/hdfs dfs -copyFromLocal ../sqoop-1.4.5.bin__hadoop-0.23/lib/avro-mapred-1.7.5-hadoop2.jar /home/hassan/hadoop/sqoop-1.4.5.bin__hadoop-0.23/lib/.
bin/hdfs dfs -copyFromLocal ../sqoop-1.4.5.bin__hadoop-0.23/sqoop-1.4.5.jar /home/hassan/hadoop/sqoop-1.4.5.bin__hadoop-0.23/.

0

I also got the same problem and the problem was fixed by copying the sqoop folder in hdfs (/usr/local/sqoop) as set in .bashrc.
After that I realized and deleted this path from hdfs and set on path $SQOOP_HOME/lib . and it worked fine

0

Please execute below commands which will resolve your problem. It will copy your files in HDFS system.

hdfs dfs -copyFromLocal /opt/hadoop/sqoop-1.4.6 hdfs://localhost:9000/opt/hadoop/sqoop-1.4.6

hdfs dfs -copyFromLocal /opt/hadoop/sqoop-1.4.6/lib hdfs://localhost:9000/opt/hadoop/sqoop-1.4.6/lib

Note : please treat below curly braces inline section as one line

{hdfs dfs -copyFromLocal /opt/hadoop/sqoop-1.4.6/sqoop-1.4.6.jar hdfs://localhost:9000/opt/hadoop/sqoop-1.4.6/sqoop-1.4.6.jar}

Similarly copy any file which is not able to get by HDFS system.

Note : In above command /opt/hadoop/sqoop-1.4.6 is my system sqoop installation location

Thanks,
Iqubal Mustafa Kaki