-
爬虫不求人,上手跟我做(六)线程调度爬小说
2019-04-23 21:57 -
爬虫不求人,上手跟我做(六)线程调度爬小说
2019-04-23 21:56详细代码:可以查看 github:
https://github.com/Wangyanjunai/novel.git
上查看。 -
爬虫不求人,上手跟我做(六)线程调度爬小说
2019-04-23 21:53这个是那个方法爆出的异常:
public void renderChapterBean(Response response) {
Chapter chapter = null;
try {
if (log.isDebugEnabled()) {
log.debug("【后台爬虫系统爬取数据】开始爬取每页小说章节数据");
}
BusinessConstants.lock.lock();
try {
chapter = response.render(Chapter.class);
chapter.setIndex(BusinessConstants.CURRENT_CHAPTER_INDEX);
if (log.isDebugEnabled()) {
log.debug("bean resolve res={}, url={}", chapter, response.getUrl());
}
NovelChapter novelChapter = NovelChapter.builder().build();
if (chapter != null) {
chapter.setId(UUIDUtil.gen32UUID());
String htmlContent = null;
if(StringUtils.isNotBlank(chapter.getContent())){
chapter.setStarturl(BusinessConstants.CURRENT_GET_DATA_URL);
chapter.setUrl(BusinessConstants.CURRENT_GET_BOOK_DATA_URL);
htmlContent = chapter.getContent();
// 内容不为空的时候转化
chapter.setContent(FlexmarkHtmlParser.parse(htmlContent));
}
BeanUtils.copyProperties(chapter, novelChapter);
String title = chapter.getNovelName();
String categoryText = chapter.getCategoryCNText();
NovelInfo novelInfo = novelInfoServiceReader.findByTitleAndCategoryText(title, categoryText);
if (novelInfo != null) {
novelChapter.setBookId(novelInfo.getId());
novelChapter.setNewestChapterTitle(novelInfo.getNewestChapterTitle());
} else {
novelChapter.setBookId(UUIDUtil.gen32UUID());
novelChapter.setNewestChapterTitle("最后一章出错了");
}
NovelChapter novelChapterTemp = chapterServiceWriter.save(novelChapter);
String lastChapterName = null;
if (novelChapterTemp != null) {
lastChapterName = novelChapterTemp.getNewestChapterTitle();
List chapterList = chapterServiceReader.findByChaperTitle(lastChapterName);
if (chapterList != null && chapterList.size() > 0) {
NovelChapter lastChapter = chapterList.get(0);
String lastChapterId = lastChapter.getId();
String bookId = lastChapter.getBookId();
NovelInfo novelInfo2 = novelInfoServiceReader.find(bookId);
if (novelInfo2 != null) {
novelInfo2.setNewestChapterId(lastChapterId);
novelInfoServiceWriter.update(novelInfo2);
if (log.isDebugEnabled()) {
log.debug("【后台爬虫系统爬取数据】开始爬取每页 10、小说最新章节 id 信息 data={}", lastChapterId);
}
}
}
}
if (log.isDebugEnabled()) {
log.debug("covert copy store success in db, novel chapter info={}", novelChapterTemp);
}
}
if (BusinessConstants.CURRENT_CHAPTER_INDEX >= BusinessConstants.CURRENT_TOTAL_CHAPTERS) {
// 当前类型所有页跑完唤醒上一级
BusinessConstants.CURRENT_CHAPTER_INDEX = BusinessConstants.CURRENT_TOTAL_CHAPTERS;// 将这个变量重置为 0
BusinessConstants.CURRENT_CHAPTER_INDEX = 0;
BusinessConstants.conditionPoolStart.signal();
BusinessConstants.lock.unlock();// 调用 signal() 方法后需要手动释放
}
} catch (Exception e) {
log.error("【后台管理】爬取小说章节信息失败", e);
} finally {
if (log.isDebugEnabled()) {
log.debug("render success chapter info={}", chapter);
}
}
// 唤醒上一级线程
BusinessConstants.conditionPoolBook.signal();
BusinessConstants.lock.unlock();
} catch (Exception e) {
log.error("【后台爬虫系统爬取数据】爬取每页小说章节数据出现错误", e);
// 当前有异常唤醒上一级
BusinessConstants.conditionPoolBook.signal();
BusinessConstants.lock.unlock();
} finally {
if (log.isDebugEnabled()) {
log.debug("【后台爬虫系统爬取数据】结束爬取每页小说章节数据");
}
}
} -
爬虫不求人,上手跟我做(六)线程调度爬小说
2019-04-23 21:51出现这个异常。
java.lang.IllegalMonitorStateException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
at com.potato369.novel.crawler.crawlers.DuShu88NovelCrawler.renderChapterBean(DuShu88NovelCrawler.java:452)
at cn.wanghaomiao.seimi.core.SeimiProcessor.doLambdaCallback(SeimiProcessor.java:177)
at cn.wanghaomiao.seimi.core.SeimiProcessor.run(SeimiProcessor.java:116)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-04-23 21:41:28,487 - 1652879 [pool-2-thread-20] ERROR c.w.seimi.core.SeimiProcessor - null
java.lang.IllegalMonitorStateException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
at com.potato369.novel.crawler.crawlers.DuShu88NovelCrawler.renderChapterBean(DuShu88NovelCrawler.java:457)
at cn.wanghaomiao.seimi.core.SeimiProcessor.doLambdaCallback(SeimiProcessor.java:177)
at cn.wanghaomiao.seimi.core.SeimiProcessor.run(SeimiProcessor.java:116)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) -
爬虫不求人,上手跟我做(六)线程调度爬小说
2019-04-23 21:48出现错误。java.lang.IllegalMonitorStateException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
at com.potato369.novel.crawler.crawlers.DuShu88NovelCrawler.renderChapterBean(DuShu88NovelCrawler.java:452)
at cn.wanghaomiao.seimi.core.SeimiProcessor.doLambdaCallback(SeimiProcessor.java:177)
at cn.wanghaomiao.seimi.core.SeimiProcessor.run(SeimiProcessor.java:116)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-04-23 21:41:28,487 - 1652879 [pool-2-thread-20] ERROR c.w.seimi.core.SeimiProcessor - null
java.lang.IllegalMonitorStateException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939)
at com.potato369.novel.crawler.crawlers.DuShu88NovelCrawler.renderChapterBean(DuShu88NovelCrawler.java:457)
at cn.wanghaomiao.seimi.core.SeimiProcessor.doLambdaCallback(SeimiProcessor.java:177)
at cn.wanghaomiao.seimi.core.SeimiProcessor.run(SeimiProcessor.java:116)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) -
爬虫不求人,上手跟我做(六)线程调度爬小说
2019-04-21 22:05不是很清楚逻辑,可以详细讲解下吗??
-
爬虫不求人,上手跟我做(六)线程调度爬小说
2019-04-21 15:47没有详细爬取每一个分类下面的小说信息,小说信息怎么获取?只有章节信息的获取,不能正确理解是哪个小说的章节?
-
下面是我写的 java spring boot 爬取小说网站全站小说信息代码,一直达不到效果,不知道哪里写的有问题,望大神指点迷津
2019-04-18 20:25怎么实现,可以具体点吗?一直困扰我,想自己做出来。
-
下面是我写的 java spring boot 爬取小说网站全站小说信息代码,一直达不到效果,不知道哪里写的有问题,望大神指点迷津
2019-04-18 15:54不知道在哪加,望您指点。
-
下面是我写的 java spring boot 爬取小说网站全站小说信息代码,一直达不到效果,不知道哪里写的有问题,望大神指点迷津
2019-04-18 11:47还是不知到怎么改路径。
-
下面是我写的 java spring boot 爬取小说网站全站小说信息代码,一直达不到效果,不知道哪里写的有问题,望大神指点迷津
2019-04-18 11:07这样只是爬取类别,没有影响的。
jack
新手
- 0 标签
- 1 帖子
- 11 回帖
github 网址:https://github.com/Wangyanjunai/novel/tree/master/novel_crawler