c# - Getting Matched HTML Value with Regex -


ok start know should not using regex parse html it's not reliable, not 100% safe, etc. however, learning excercise regex as else.

so example uses bbc website http://www.bbc.co.uk/sport/football/premier-league/table.

the project parsing tbody of first table. trying search elements matching search value returned. example, given search "manc" want tr tag manchester city , manchester united (matched url).

what have far <tr\b[^>]*>(.*?)manc(.*?)</tr> matches first tr closing tr after man city , returns expected result man utd. point out i've gone wrong regex.

edit: source (trimmed)

<tbody id="trc-20-118996114-3">   <tr id="team-138824012" class="team first">     <td class="statistics"></td>     <td class='position'>       <span class='moving-up'>moving up</span>       <span class='position-number'>1</span>     </td>     <td class="team-name">       <a href='http://www.bbc.co.uk/sport/football/teams/arsenal'>arsenal</a>     </td>     <td class="played">0</td>      <td class="home-won">       <span>0</span>     </td>     <td class="home-drawn">0</td>     <td class="home-lost">0</td>     <td class="home-for">0</td>     <td class="home-against">0</td>     <td class="away-won">       <span>0</span>     </td>     <td class="away-drawn">0</td>     <td class="away-lost">0</td>     <td class="away-for">0</td>     <td class="away-against">0</td>     <td class="goal-difference">0</td>     <td class="points">0</td>     <td class="last-10-games">       <ol>         <li class="win" title="win">           <span>win</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win last" title="win">           <span>win</span>         </li>       </ol>     </td>     <td class="status">       <a class="report" href="http://www.bbc.co.uk/sport/0/football/17973141">report</a>     </td>   </tr>   <tr id="team-137316633" class="team">     <td class="statistics"></td>     <td class='position'>       <span class='moving-up'>moving up</span>       <span class='position-number'>2</span>     </td>     <td class="team-name">       <a href='http://www.bbc.co.uk/sport/football/teams/aston-villa'>aston villa</a>     </td>     <td class="played">0</td>      <td class="home-won">       <span>0</span>     </td>     <td class="home-drawn">0</td>     <td class="home-lost">0</td>     <td class="home-for">0</td>     <td class="home-against">0</td>     <td class="away-won">       <span>0</span>     </td>     <td class="away-drawn">0</td>     <td class="away-lost">0</td>     <td class="away-for">0</td>     <td class="away-against">0</td>     <td class="goal-difference">0</td>     <td class="points">0</td>     <td class="last-10-games">       <ol>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="loss last" title="loss">           <span>loss</span>         </li>       </ol>     </td>     <td class="status">       <a class="report" href="http://www.bbc.co.uk/sport/0/football/17973120">report</a>     </td>   </tr>   <tr id="team-137318151" class="team">     <td class="statistics"></td>     <td class='position'>       <span class='moving-down'>moving down</span>       <span class='position-number'>7</span>     </td>     <td class="team-name">       <a href='http://www.bbc.co.uk/sport/football/teams/manchester-city'>man city</a>     </td>     <td class="played">0</td>      <td class="home-won">       <span>0</span>     </td>     <td class="home-drawn">0</td>     <td class="home-lost">0</td>     <td class="home-for">0</td>     <td class="home-against">0</td>     <td class="away-won">       <span>0</span>     </td>     <td class="away-drawn">0</td>     <td class="away-lost">0</td>     <td class="away-for">0</td>     <td class="away-against">0</td>     <td class="goal-difference">0</td>     <td class="points">0</td>     <td class="last-10-games">       <ol>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="win last" title="win">           <span>win</span>         </li>       </ol>     </td>     <td class="status">       <a class="report" href="http://www.bbc.co.uk/sport/0/football/17973148">report</a>     </td>   </tr>   <tr id="team-137318152" class="team">     <td class="statistics"></td>     <td class='position'>       <span class='moving-down'>moving down</span>       <span class='position-number'>8</span>     </td>     <td class="team-name">       <a href='http://www.bbc.co.uk/sport/football/teams/manchester-united'>man utd</a>     </td>     <td class="played">0</td>      <td class="home-won">       <span>0</span>     </td>     <td class="home-drawn">0</td>     <td class="home-lost">0</td>     <td class="home-for">0</td>     <td class="home-against">0</td>     <td class="away-won">       <span>0</span>     </td>     <td class="away-drawn">0</td>     <td class="away-lost">0</td>     <td class="away-for">0</td>     <td class="away-against">0</td>     <td class="goal-difference">0</td>     <td class="points">0</td>     <td class="last-10-games">       <ol>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="draw" title="draw">           <span>draw</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="loss" title="loss">           <span>loss</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win" title="win">           <span>win</span>         </li>         <li class="win last" title="win">           <span>win</span>         </li>       </ol>     </td>     <td class="status">       <a class="report" href="http://www.bbc.co.uk/sport/0/football/17973162">report</a>     </td>   </tr> </tbody> 

the problem is, regular expression broad. you're asking for:

<tr\b[^>]*>(.*?)manc(.*?)</tr> 

lets simplify little bit.

<tr>.*?manc.*?</tr> 

so you're saying, ok. need match tr followed anything , manc , , closing tr. so. of course happens regex starts @ first tr , goes ok. i've got tr let me keep matching until find manc. in meantime, passed bunch of other tr. regex doesn't care.

try this:

<tr>(?:(?!</tr>).)*manc.+?</tr> 

or, guess in example:

<tr\b[^>]*>(?:(?!</tr>).)*manc.+?</tr> 

Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -