Loading data using Pig



0
Now that in AWS machine user has changed to hduser, files can still move to /home/ec2-user.
File emp.txt is moved to /home/ec2-user using winscp and then using below commands for loading data:
A = load 'emp.txt' using pigstorage(',') as (name:chararray,age:int,salary:int,dept_id:int);
A = load '/home/ec2-user/emp.txt' using pigstorage(',') as (name:chararray,age:int,salary:int,dept_id:int);

Data load fails, please suggest the path I need to provide for pick up file emp.txt.
Also unable to move this file to user/hduser as permissions are not there.

6 Answer(s)


0

Hi Sugandha,
Please change the user as hduser: su - hduser
You can save your file under - /user/hduser/
Hope this helps.
Thanks

0

Hi Abhijit
In WinScp unable to login as user : su - hduser. Access denied.
In AWS already logged in as user : su - hduser

0

Please check if hostname also needs to be modified while logging in to Winscp.

0

Hi Sugandha,
In case of Winscp, you don't have to switch to hduser.
You can upload your file, anywhere you want.

Thanks

0

Hi Abhijit
I loaded the data with below command:
A = load 'emp.txt' using PigStorage(',') as (name:chararray,age:int,salary:int,dept_id:int);
And then on Dump A below error comes:
Looks like backend error.

Output(s):
Failed to produce result in "hdfs://ip-10-0-0-28.ec2.internal:8020/tmp/temp-291835293/tmp915702883"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1453801387162_0022


2016-02-11 03:28:22,802 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-02-11 03:28:22,805 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
Details at logfile: /home/hduser/pig_1455179139165.log

0

Hi Sugandha,
One thing I would like to mention that "emp.txt" should be in hdfs before loading in PigStorage().
If I understand correctly, emp.txt is in /home/ec2-user/
Please upload the file in hdfs. To do that-
- su - hduser //Change user because it has permission to access /user/hduser/
- hadoop fs -put emp.txt /user/hduser/sugandha/emp.txt //Upload the file

Hope this helps. Please correct me if I understand it wrong.
Thanks.

Your Answer

Click on this code-snippet-icon icon to add code snippet.

Upload Files (Maximum image file size - 1.5 MB, other file size - 10 MB, total size - not more than 50 MB)

Email
Password