Let’s build a content extract endpoint — part 1

The world of Internet is a great source of data. Various domains serves a number of sources of data. Like a news website return a number of pieces of news. You often see the data using a browser. You enter the domain of the news website and small pieces of news are structured on the page, wrapped in HTML.

What if you wanted just a small piece of the information? Like here:

A “pen” on website codepen.com

This “pen” (a sort of online prototype where you in a fast manner can construct small pieces of prototypes using HTML, CSS and javascript) shows 36 of the most popular color sets at Adobe.

The marked piece on the website contains statistics about this particular pen. It is served by the CodePen domain, and as such is not available to the builder (me) of the pen.

It is so that precisely that piece of information is something which I would like to also show inside my pen. This leads to the target of this little project:

Build an endpoint which can extract values of a given website.

What does that mean? Well, imagine that you had an endpoint — an URL — which could look like this:

myendpoint.com/?url=codepen.com&data={total:’.single-stat’}

You then point to “myendpoint.com” and ask the service at the endpoint to get some information from the URL “codepen.com”. You want the value to get the value found using a CSS selector “.single-stat”. The value should be returned as “total”.

The response could be like this — some JSON:

{ total: ‘4,793’}

That’s it! That is all I want :-) Now it’s just for me to code it and host it somewhere…

You can read about building such an endpoint in a future post, I will update this post with a link to part 2. Please feel free to post any ideas or thoughts about this idea and also if you have an idea how to implement such an endpoint.

Part 2 ->