All overlapping substrings matching a java regex -


is there api method returns (possibly overlapping) substrings match regular expression?

for example, have text string: string t = 04/31 412-555-1235;, , have pattern: pattern p = new pattern("\\d\\d+"); matches strings of 2 or more characters.

the matches are: 04, 31, 412, 555, 1235.

how overlapping matches?

i want code return: 04, 31, 41, 412, 12, 55, 555, 55, 12, 123, 1235, 23, 235, 35.

theoretically should possible -- there obvious o(n^2) algorithm enumerates , checks substrings against pattern.

edit

rather enumerating substrings, safer use region(int start, int end) method in matcher. checking pattern against separate, extracted substring might change result of match (e.g. if there non-capturing group or word boundary check @ start/end of pattern).

edit 2

actually, it's unclear whether region() expect zero-width matches. specification vague, , experiments yield disappointing results.

for example:

string line = "xx90xx"; string pat = "\\b90\\b"; system.out.println(pattern.compile(pat).matcher(line).find()); // prints false (int = 0; < line.length(); ++i) {   (int j = + 1; j <= line.length(); ++j) {     matcher m = pattern.compile(pat).matcher(line).region(i, j);     if (m.find() && m.group().size == (j - i)) {       system.out.println(m.group() + " (" + + ", " + j + ")"); // prints 90 (2, 4)     }   } } 

i'm not sure elegant solution is. 1 approach take substring of line , pad with appropriate boundary characters before checking whether pat matches.

edit 3

here full solution came with. can handle zero-width patterns, boundaries, etc. in original regular expression. looks through substrings of text string , checks whether regular expression matches @ specific position padding pattern appropriate number of wildcards @ beginning , end. seems work cases tried -- although haven't done extensive testing. less efficient be.

  public static void allmatches(string text, string regex)   {     (int = 0; < text.length(); ++i) {       (int j = + 1; j <= text.length(); ++j) {         string positionspecificpattern = "((?<=^.{"+i+"})("+regex+")(?=.{"+(text.length() - j)+"}$))";         matcher m = pattern.compile(positionspecificpattern).matcher(text);          if (m.find())          {              system.out.println("match found: \"" + (m.group()) + "\" @ position [" + + ", " + j + ")");         }          }        }      } 

edit 4

here's better way of doing this: https://stackoverflow.com/a/11372670/244526

edit 5

the jregex library supports finding overlapping substrings matching java regex (although appears not have been updated in while). specifically, documentation on non-breaking search specifies:

using non-breaking search can finding possible occureneces of pattern, including intersecting or nested. achieved using matcher's method proceed() instead of find()

i faced similar situation , tried above answers in case took of time setting start , end index of matcher think i've found better solution, i'm posting here others. below code sniplet.

if (texttoparse != null) { matcher matcher = placeholder_pattern.matcher(texttoparse);     while(matcher.hitend()!=true){         boolean result = matcher.find();         int count = matcher.groupcount();         system.out.println("result " +result+" count "+count);         if(result==true && count==1){             mergefieldname = matcher.group(1);             mergefieldnames.add(mergefieldname);            }        }   } 

i have used matcher.hitend() method check if have reached end of text.

hope helps. thanks!


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -