This project is a QQ bot built to deliver game news for Princess Connect! Re:Dive. In practice, it touches a wide range of basics and tools: web scraping, APIs, JSON handling, SQL databases, Java Spring, Maven, and GitLab. Along the way, it also involves a number of small but useful technical details that tend to show up in real development.
Project notes and scattered essentials
A few practical points came up repeatedly during development:
- How to recover from Git's
MERGINGstate
1 2</th>
<th>git reset --hard head # 回退到上一步</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
-
An
.msifile is a binary installer package. -
Log redirection is related to Spring configuration files.
-
Files such as
.xmland.ymlare configuration files that can be recognized and used by programming languages. -
There are two common ways to clone a Git repository:
- HTTP
- SSH
- A typical request flow is: the frontend calls a controller, the controller processes the request, and then delegates business logic to a service.
- This layered structure is still common across internet companies.
- Frontend and backend are separated.
- The frontend sends a request.
- The request reaches the controller.
- The controller handles the request and returns a response.
- The service handles business logic.
- The DAO layer is responsible for data interaction.
-
Compared with VS Code, Typora feels simpler and faster for image insertion and some Markdown-related editing.
-
Every piece of information on the web has a unique network address: that is a URL.
GitLab setup
A private Git repository was used through GitLab. It follows the Apache protocol, is strong from an open-source standpoint, and offers relatively high confidentiality.
<table> <thead> <tr> <th>URL</th> <th>account</th> <th>password</th> </tr> </thead> <tbody> <tr> <td>http://...:***/</td> <td>[email protected]</td> <td>XXXXXXX</td> </tr> </tbody> </table>Ignoring files in Git
There are two different ideas behind “ignoring” files in Git: never tracking them in the first place, or stopping updates to files that are already present.
- Two ways to ignore file changes:
.gitignore- Use a command to stop updates
- Reference: Git ignore article link
1 2 3 4 5</th>
<th># 标记文件,该文件后续的变化都不会更新到index tree git update-index --assume-unchanged [filename] # 恢复对文件的版本控制 git update-index --no-assume-unchanged [filename]</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
- One thing that is easy to forget with
.gitignore: when new content has already been added and should no longer be committed, the cache must be cleared.
1 2</th>
<th># 最后的一个“.”不要忘了 git rm -r --cached .</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
.gitignoreis a double-edged sword. When switching branches in particular, any untracked files excluded by.gitignoreshould be backed up first.
Connecting to MySQL
When building the database connection string, dbc:mysql: needs the server address, database name, username, password, and the Tomcat/HTTP interface port.
- Address:
<XXXXXXXXXXXXXXXXXXX.com> - The exact syntax still needs deeper study and memorization.
When connecting directly to the database, the database port is used instead.
- For MySQL, the default port is
3306.
Scraping dynamic updates
The target endpoint was:
Scraped page: https://api.vc.bilibili.com/dynamic_svr/v1/dynamic_svr/space_history?host_uid=353840826
General approach
The overall logic is fairly standard:
- Get the page HTML
- Filter out the needed information through
formatFilter - Convert it into a single string
- Output the dynamic update
Implementation process
1. Fetch the page HTML / JSON
- The
HttpURLConnectionlibrary can be used. URL.getcontentreturns information about the URL, not the page content of the site itself.- In the Java API, it is important to distinguish between abstract classes and the parts that can actually be used directly.
1 2</th>
<th>//最短解决方法 String pageContent = IOUtils.toString(URI.create("[urlAddress]"));</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
2. Filter the required fields
Using the Jackson package is simpler and more efficient, and it can replace JsonParser / the JsonObject package in this case.
1 2 3 4 5 6 7 8 9</th>
<th>ObjectMapper objectMapper = new ObjectMapper(); try { JsonNode node = objectMapper.readValue(pageContent, JsonNode.class); //就使用get get get 就行 JsonNode brandNode = node.get("data").get("cards").get(0).get("card"); String theCard = brandNode.asText(); } catch (IOException e) { e.printStackTrace(); }</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Regular expressions were then used to extract the content itself.
<table> <thead> <tr> <th>1 2 3 4 5 6 7 8 9 10 11 12</th>
<th>//设置正则表达式 Pattern pattern = Pattern.compile("description.*category"); Matcher dataMatcher = pattern.matcher(bilibiliCardContent); //一定要有dataMatcher.find()这一步,可以使用while来不停的截取 if (dataMatcher.find()) { messageTemp = dataMatcher.group(); } else { messageTemp = "没有哦"; } int messageLength = messageTemp.length(); //截取真正的内容(因为有pattern的问题) BilibiliCardUnicodeContent = messageTemp.substring(14, messageLength - 11);</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
A few additional notes matter here:
JsonParserhas already been marked@deprecated.JsonParsercan be used directly through a static method call.
1 2 3</th>
<th>//创建jsonObject对象 String pageContent = "XXXXXXX" JsonObject json = (JsonObject) JsonParser.parseString(pageContent);</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
3. Convert everything into a single string
This step includes converting Unicode hexadecimal text into readable content.
- Reference: Unicode2String article link
- Related pieces of knowledge:
Also worth noting:
replaceAll(regex,string)returns a modified string instead of changing the original string in place.- A visible plaintext line break is written as
\n.
4. Output the dynamic update
Once the content is extracted, cleaned, and converted, it can be sent out as the bot's dynamic update message.
Image record from the project

A small extra from the project: a strangely magical captcha.