Golang 爬虫 抓取豆瓣小组图片 通过api提交入库到 Chevereto 图床

来源: 老季博客
日期: 2017-2-10
作者: 腾讯云/服务器VPS推荐评测/Vultr
阅读数: 45

前面我们提到了Python 爬虫 抓取豆瓣小组图片 通过api提交入库到 Chevereto 图床,后来闲着无聊又使用Golang写了一个脚本,用来抓取豆瓣小组的图片。

Chevereto free版本 使用api 上传图片 图文教程

图床地址:http://788to.com

使用之前大家先配置一下Golang的环境,然后安装一下必要的包:

go get github.com/PuerkitoBio/goquery

脚本运行时可以使用两个参数:

-u 小组的url地址,例如:https://www.douban.com/group/meituikong/discussion?start=

-e 最后一些的start=的值

-k?Chevereto密匙

完整的运行示例:

go run get-douban-image.go -u=”https://www.douban.com/group/265201/discussion?start=” -e=”700″ -k=”laoji.org”

git 地址:https://github.com/qsbaq/doubanImage

源码如下,以下代码仅作演示,以git地址代码为准:

package main

import (
	"encoding/json"
	"flag"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"net/url"
	"regexp"
	"strconv"
	"sync"
	"time"

	"github.com/PuerkitoBio/goquery"
)

func GetUrl(url string) []byte {
	ret, err := http.Get(url)
	if err != nil {
		log.Println(url)
	}
	body := ret.Body
	data, _ := ioutil.ReadAll(body)
	return data
}

func getImage(image_url string, k string) {
	data := GetUrl(image_url)
	body := string(data)
	part := regexp.MustCompile("https://(.*).doubanio.com/view/group_topic/large/public/(.*).jpg")
	match := part.FindAllString(body, -1)
	for _, value := range match {
		submit_url := "http://788to.com/api/1/upload/?key=" + k + "&source=" + url.QueryEscape(value)
		fmt.Println(submit_url)
		return_json := GetUrl(submit_url)
		res := make(map[string]interface{})
		json.Unmarshal(return_json, &res)
		log.Printf("%s -> %v \n", value, res["status_code"])
	}
}

func getGroupList(target_url string, k string) {
	fmt.Printf("Begin Url : %s\n", target_url)
	doc, err := goquery.NewDocument(target_url)
	if err != nil {
		panic(err)
		log.Fatal(err)
	}
	// Find the review items
	doc.Find("td.title a").Each(func(i int, s *goquery.Selection) {
		// For each item found, get the band and title
		href, IsExist := s.Attr("href")
		if IsExist {
			getImage(href, k)
		}
	})
	wg.Done()
}

var wg sync.WaitGroup

func main() {
	k := flag.String("k", "laoji.org", "Chevereto Key")
	endStartInt := flag.Int("e", 100, "End Start Int Value")
	defaultUrl := flag.String("u", "https://www.douban.com/group/meituikong/discussion?start=", "Group Url")
	flag.Parse()
	for i := 0; i < *endStartInt; i = i + 25 {
		wg.Add(1)
		go getGroupList(*defaultUrl+strconv.Itoa(i), *k)
		time.Sleep(3e9)
	}
	wg.Wait()
}

运行结果:

2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p615
41380.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p447
24331.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p655
69545.jpg -> 200
2017/02/10 08:18:10 https://img1.doubanio.com/view/group_topic/large/public/p447
24327.jpg -> 200
Begin Url : https://www.douban.com/group/265201/discussion?start=500
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p470
29205.jpg -> 200
2017/02/10 08:18:10 https://img5.doubanio.com/view/group_topic/large/public/p336
82186.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79344.jpg -> 200
2017/02/10 08:18:11 https://img5.doubanio.com/view/group_topic/large/public/p470
29206.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79345.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p487
17685.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p507
72901.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p452
23799.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p477
58309.jpg -> 200

链接到文章: https://jiloc.com/43114.html

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注