preface

good, I am Xiao Chen. I haven't updated my blog for a long time. I'll bring you a dry article today, a small gadget that grabs blog page information every 5 minutes and sends it to your mailbox at second o'clock in the morning at 9 o'clock. For example, when I came to the company at 9 in February 14, 2018, I received an email, a message from the home page of the blogger garden in February 13, 2018. Write this gadget is the original, has been to see the blog habits, but recently due to various reasons, perhaps a few days will not look at the blog, if midway missed what but very distressed: haha. So to make a tool and send it to the mailbox every day, mom won't worry that I miss the good article again. Why do you only grab the home page? Because the quality of the articles on the home page of the blogosphere is relatively high.

prepares

as a continuous running tool. There is no log record how to do it. I'm going to use NLog to record logs. It has a log archiving function, which is very good. In the HTTP request, because of the network problem, there may be a failure. Here I use Polly for Retry. The use of HtmlAgilityPack to parse a web page requires a certain understanding of XPath. The following is a detailed description of the

:

uses the component name

github
NLog https://github.com/NLog/NLog
Polly HTTP when the request fails, https://github.com/App-vNext/Polly
HtmlAgilityPack https://github.com/ zzzprojects/html-agility-pack
MailKit to send mail" external nofollow ">https://github.com/jstedfast/MailKit

has an unknown component that can be accessed by accessing the GitHub.

http://www.jb51.net/article/112595.htm

& to obtain the reference; analytical data

blog home page I use HttpWebRequest to make HTTP requests, I share the simple package library:

 using System; using System.IO; using System.Net; using System.Text; namespace summary> < CnBlogSubscribeTool {/ / / / / / Simple Http Request; Class.NET Framework > / / / Author:stulzq / / / CreatedTime:2017-12-12 = 4; / / / 15:54:47 / / / < / summary> public; class HttpUtil {static HttpUtil (//Set) {connection limit, Default limit is 2 ServicePointManager.DefaultConnectionLimit = 1024;} / / / / / / < summary> Default Timeout /summary> public 20s / / / < static int Default Timeout = 20000; / / / < / / / Is Auto Redirect; summary> /summary> public / / / < static bool DefalutAllowAutoRedirect = true; / / / < / / / Default / / / Encoding; summary> < /summary> public static Encoding = DefaultEncoding Encoding.UTF8; < summary> / / / / / / / / / UserAgent; Default < /summary> public; static string DefaultUserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/62.0.3202.94 Safari/ 537.36"); / / / < / / / Default / / / Referer; summary> < /summary> public static string = "DefaultReferer"; / / / < / / / httpget / / / request; summary> < /summary> / / / < param name= "URL" > Internet Address< /param> returns> string& / / / < Lt; /returns> public static string GetString (string URL) {var stream = GetStream (URL); string result; using (StreamReader sR = new StreamReader (stream)) {result = sr.ReadToEnd (return result);};} / / / < / / / httppost / / / request; summary> < /summary> / / / < param name= "URL" > Internet Address< /param> / / / < param name= "postData" > Post request data< /param> returns> / / / < string< /returns> public static string PostString (string URL, string postData) {var stream = PostStream (URL, postData); string result; using (StreamReader sR = new StreamReader (stream)) {result = sr.ReadToEnd (return result);};} / / / < / / / Create / / / Response; summary> < /summary& Gt; / / / < param name= "URL" > < /param> / / / < param name= "post" > Is post Request< /param> / / / < param name= "postData" > Post request data< /param> returns> / / / < < /returns> public static WebResponse CreateResponse (string URL bool, post, string) {postData = "var httpWebRequest = WebRequest.CreateHttp (URL); httpWebRequest.Timeout = DefaultTimeout; httpWebRequest.AllowAutoRedirect = DefalutAllowAutoRedirect; httpWebRequest.UserAgent = DefaultUserAgent; httpWebRequest.Referer = DefaultReferer; if (post) {var data = DefaultEncoding.GetBytes (postData); httpWebRequest.Method =" POST "; httpWebRequest.ContentType = application/x-www-form-urlencoded; charset=utf-8; httpWebRequest.Co NtentLength = data.Length; using (VaR) (stream = httpWebRequest.GetRequestStream) {stream.Write (data, 0, data.Length);}} try {var = httpWebRequest.GetResponse (response); return response;} catch {throw (Exception E) new Exception (string.Format ("Request error, url:{0}, IsPost:{1}, Data:{2}, Message:{3}, URL post, postData, e.Message, e), <);}} / / / / / / HTTP; summary> get request /summary> / / / / / / < < param name=" URL "> < /param> returns> Response / / / < Stream< /returns> public static Stream GetStream (string URL) {var stream = CreateResponse (URL, false) (.GetResponseStream); if (stream = = null) {throw (" new Exception Response error, the response stream is null "); } else {return}} / / / stream; < summary> HTTP post request / / / / / / / / / < /summary> < param name= "URL" > < /param> / / / < param name= "postData" > post data< /param> returns> Response / / / < Stream< /returns> public static Stream PostStream (string URL, string postData) {var stream = CreateResponse (URL, true, postData) (.GetResponseStream); if (stream = = null) {throw ("new Exception Response error, the response stream is null");} else {return}}} stream;}

 string data access page res = HttpUtil.GetString (https://www.cnblogs.com 

parsed data

, we succeeded in getting HTML, but how do we extract the information we need? Here is our sharp sword HtmlAgilityPack, a component that can be used to parse a web page based on XPath.

html:

 get loaded in front of us HtmlDocument doc = new (HtmlDocument); doc.LoadHtml (HTML); 

src=

from above, we can see that each article all the information in a class post_item div, we first get all the class=post_item div

 gets all the data of var / itemBodys = doc.DocumentNode.SelectNodes (" //div[@class='post_item_body'] "); < /pre> 

we continue the analysis, we can see the article title H3 tags in the following class=post_item_body div under the a label, the information in the class=post_item_summary P label, released in time and author analyzed class=post_item_foot div,. We can We want to remove the data:

 foreach (VaR itemBody in itemBodys) {/ / titleElem = itemBody.SelectSingleNode var header element ("h3/a"); / / get the title of VaR title = titleElem.InnerText; / / get URL var? Url = titleElem?.Attributes["href"]?.Value; / / summaryElem = itemBody.SelectSingleNode var abstract elements ("p[@class='post_item_summary']"); / / get the VaR summary = summaryElem?.InnerText.Replace ("rn", ".Trim") (); / / data item the bottom element var footElem = itemBody.SelectSingleNode ("div[@class='post_item_foot']"); / / get author author var = footElem?.SelectSingleNode.InnerText ("a"); / / get the release? Var publishTime = Regex.Match (footElem?.InnerText, "\d+-\d+-\d+ \d+ \d+".Value; Console.Writ) ELine ("title: ${title} ($Console.WriteLine);" URL: {url} "); Console.WriteLine (" Abstract: ${summary} ($Console.WriteLine); "Author: {author}"); Console.WriteLine ($"published: {publishTime} (" Console.WriteLine "); - Gorgeous line -");}

run:

alt= ""

, we successfully obtained the information we want. Now we define a Blog object to load them up.

 public class Blog < {/ / / / / / / / / < summary> /summary> title; public string {Title get; set <} / / / / / / / / /; summary> /summary> < URL; public string Url {get; set; < summary>} / / / / / / / / / abstract; < /summary> public string Summary {get; set; < summary>} / / / / / / / / / author; < /summary> public string Author {get; set; < summary>} / / / / / / / / / release time; < /summary> public DateTime PublishTime {get set;};}

http

request failed retry retry we use in our Polly the HTTP request fails, set to retry 3 times.

 / / _retryTwoTimesPolicy = Policy.Handle< initialization retry; Exception> (.Retry) (3, (ex, count) {=> _logger.Error ("Excuted Failed! Retry {0}, count); _logger.Error (" Exeption from {0} ", ex.GetType (.Name));}); 

test:

"can be seen when exception is Polly will help the three time we try again, if the three retry failed would give up.

sends mail

to send mail by MailKit. It supports IMAP, POP3 and SMTP protocol, and is excellent across platforms. The following is based on a library to share their package in front of the friends of the park:

 using System.Collections.Generic; using CnBlogSubscribeTool.Config; using MailKit.Net.Smtp; using MimeKit; namespace CnBlogSubscribeTool {/ / / &


This concludes the body part