c# - How can I parse this HTML to get the content I want? -

- April 15, 2015

i trying parse html document retrieve of footnotes inside of it; document contains dozens , dozens of them. can't figure out expressions use extract of content want. thing is, classes (ex. "calibre34") randomized in every document. way see footnotes located search "hide" , it's text afterwards , closed < /td> tag. below example of 1 of footnotes in html document, want text. ideas? guys!

<td class="calibre33">1.<span><a class="x-xref" href="javascript:void(0);"> [hide]</a></span></td> <td class="calibre34"> among other factors on premium based average size of losses experienced, margin contingencies, loading cover insurer's expenses, margin profit or addition insurer's surplus, , perhaps investment earnings insurer realize time premiums collected until losses must paid.</td>

use htmlagilitypack load html document , extract footnotes xpath:

//td[text()='[hide]']/following-sibling::td

basically,what first selecting td nodes contain [hide] , go , select next sibling. next td. once have collection of nodes can extract inner text (in c#, support provided in htmlagilitypack).

Search This Blog

Convert PH

c# - How can I parse this HTML to get the content I want? -

Comments

Post a Comment

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -