Converting Html utf-8 charset to ISO-8859-1 via C# -
i've been struggling convert html value of attribute, without success.
here the html trying convert (sure charset not shown here, but, see see it).
<a href="https://sistemas.usp.br/jupiterweb/listargradecurricular?codcg=12&codcur=12012&codhab=1&tipo=n" target="_blank">administração – são paulo – diurno</a>
all right, value of htmlnode "administração - são paulo - diurno".
i using htmlagilitypack parse htmlpage this, , once reach node, innertext value : administração â são paulo â diurno
i assuming original charset of page utf-8 because thats encoding tag on html says me.
how can convert weird string : administração - são paulo - diurno
?
i've tried these threads : thread one , thread two , nothing solved issue
edit: getting page via c# webrequest get.
edit2 : added htmlagilitypack tag
the problem isolated : webrequest messing html sometimes.
is there other way set encoding ? trying : _webreq.encoding = "iso-8859-1"
thanks in advance
after small test, can see string not getting encoded original form.
sample test:
var item = "administração - são paulo - diurno"; console.writeline(item); var buffer = encoding.utf8.getbytes(item); var item2 = encoding.default.getstring(buffer); console.writeline(item2);
this prints:
administraçao - sao paulo - diurno administraa§a£o - sa£o paulo - diurno
as can see, original string being converted bytes using utf8, being converted string using default encoding.
this wrong.
if webrequest.getresponse() returning string wrong value, there problem method. try setting transferencoding property on httpwebrequest utf8.
before can set transferencoding property, must first set sendchunked property true. clearing transferencoding setting null has no effect on value of sendchunked. values assigned transferencoding property replace existing contents.
or can try set encoding utf8 on streamreader open. can see code?
Comments
Post a Comment