regular expression


3 Answer(s)


Hi Amit,

The concept about conversion of XML into CSV by using REPLACE AND REGEX_EXTRACT_ALL:

Note: This using XMLLoader available in piggybank.jar file. First execute REGISTER piggybank.jar command.

1. First load the XML file using XMLLoader with property as root tag.

A = load '/hdfs-site.xml' using org.apache.pig.piggybank.storage.XMLLoader('property') as (x:chararray);

2. Replace the element and place them in new line:

Example - (<property>      <name>Name</name>      <value>Value</value>    </property>)

B = foreach A generate REPLACE(x,'[\\n]','') as x;

3. It will copy the values of name and value tag given in angular brackets(<>) and place them in brackets in form:

Example - ((name,value))

C = foreach B generate REGEX_EXTRACT_ALL(x,'.*(?:<name>)([^<]*).*(?:<value>)([^<]*).*');

4. Flattern the output or remove the unnecessary brackets:

Example - (name,value)

D =FOREACH C GENERATE FLATTEN (($0));

 

Hope this helps.

Thanks.


thank u sir,

 but new line character is \n right..? then why we need to put [\\n]

and also what is the meaning of . / * / ? /  : / and  ([^>]) in the third line ?

 do u have any link to regular expression in pig..?

also can u give some examples on regular expression


Hi Amit,

Pig uses the Java regular expresssion.

Please refer the following link for understanding the Pig regular expression:

Link : http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html

Hope this helps.

Thanks.