I need a script to build a graph of some Wikipedia pages. Instead of coding it myself, I experimented with using an LLM to generate it for me. After spending two hours in front of my computer, here is the prompt I have written:
I am building a graph from a first-level list of wiki pages with URLs like this: https://zh.wikipedia.org/wiki/%E9%99%B3%E7%99%BE%E7%A5%A5 | |
For each URL, I want to call the summary URL: https://zh.wikipedia.org/api/rest_v1/page/summary/%E9%99%B3%E7%99%BE%E7%A5%A5 | |
It will return a JSON object. In the JSON object, read the "wikibase_item" attribute. Append the value to the Wikidata URL: https://wikidata.org/wiki/Special:EntityData/{wikibase_item}.json | |
The returned JSON is like this: | |
``` | |
{ | |
"entities": { | |
"{wikibase_item}": { | |
"pageid": 657639, | |
"ns": 0, | |
"title": "Q698507", | |
"lastrevid": 2295483018, | |
"modified": "2025-01-06T11:50:12Z", | |
"type": "item", | |
"id": "Q698507", | |
"labels": { | |
"zh-hans": { | |
"language": "zh-hans", | |
"value": "陈百祥" | |
}, | |
"zh-hant": { | |
"language": "zh-hant", | |
"value": "陳百祥" | |
}, | |
... | |
}, | |
"descriptions": {...}, | |
"aliases": {...}, | |
"claims": { | |
"P21": [ | |
{ | |
"mainsnak": { | |
"snaktype": "value", | |
"property": "P21", | |
"hash": "85ad4b1c7348f7a5aac521135040d74e91fb5939", | |
"datavalue": { | |
"value": { | |
"entity-type": "item", | |
"numeric-id": 6581097, | |
"id": "Q6581097" | |
}, | |
"type": "wikibase-entityid" | |
}, | |
"datatype": "wikibase-item" | |
}, | |
"type": "statement", | |
"id": "q698507$1EB77DE7-E96D-4BEB-B578-C59DFD04CA40", | |
"rank": "normal" | |
} | |
], | |
"P26": [...], | |
... | |
} | |
} | |
} | |
} | |
``` | |
- Read the claim "P32" (which means "instance of"). | |
- If the "id" of the "datavalue" is "Q5" (which means "human"), read the title and the values of "zh-hant" and "en" in the "labels". Then, mark the entity type as "human". | |
- If the "id" of the "datavalue" is "Q43229" (which means "organization"), read the title and the values of "zh-hant" and "en" in the "labels". Then, mark the entity type as "organization". | |
- If the "id" of the "datavalue" is something else | |
- Go to the item's JSON using: https://wikidata.org/wiki/Special:EntityData/{id}.json | |
- Read the claim "P279" (which means "subclass of") | |
- If it is "Q5" or "Q43229", extract the labels and mark the entity type accordingly. | |
- Otherwise, recursively go to the item's JSON using https://wikidata.org/wiki/Special:EntityData/{id}.json until reaching a depth of 10. | |
Now, call https://zh.wikipedia.org/api/rest_v1/page/html/%E9%99%B3%E7%99%BE%E7%A5%A5. | |
- If the subclass is neither "human" nor "organization", ignore the entity. | |
It returns an HTML document. Extract all href attributes from <a> tags with `rel="mw:WikiLink"`. Store these links in a second-level list and process each item similarly to the first-level list. Additionally, create a relationship between the first-level and second-level entities. | |
Make sure to cache the results and serialize them in a directory so that, even if I terminate the program with Ctrl+C, I can continue building the graph without accessing the URLs twice. | |
Finally, save the graph as a CSV file in a triplet format. |
It is a short prompt if you remove the JSON sample. You might think it shouldn't take two hours to write—but think again.
Programming is the process of instructing a computer on exactly what to do. So, when I was writing the prompt, I was designing how the script would work. At the same time, I had to figure out how to gather the right data by studying the best way to retrieve it.
The resulting prompt is a pseudocode-level specification. The time invested was worth it because it worked right away. There were minor bugs, but they were very easy to fix.
Here is what Grok 3 commented:
In short, the prompt’s apparent simplicity hides a dense web of interlocking steps, each requiring careful thought, validation, and articulation. Two hours is reasonable for distilling such a process into a coherent set of instructions, especially if you were simultaneously designing the workflow and documenting it. It’s a bit like writing code and its documentation at the same time—except you’re doing it in natural language, which adds an extra layer of effort to keep it intuitive yet precise.
Yes, you can do Vibe Coding—blindly accepting AI suggestions, copying error messages, and hoping for the best for hobby projects. If you are working on real projects with deadlines, it is better to learn software engineering properly.