Regarding Reddit JSON API, you can get a JSON document by adding /.json to any Reddit URL. This can be used to extract various data from any subreddit. To show how to do that in HTML5 using Phaser framework, we will create a Reddit’s Image Scraper application.
For the start, try the next url to get a JSON document for the /r/pics subreddit :
https://www.reddit.com/r/pics/.json
So this can be used to extract various data from any subreddit making an image viewer/browser for instance.
Furthermore, all images and their data scraped from a Reddit JSON could be also used to make some simple games. Here are some ideas:
- guessing which of two images has a better/worse upvotes score
- guessing a right range of upvotes for an image choosing between 2 or 3 different ranges
- guessing a title of an image choosing between 2 or 3 different titles as fast as possible
For now, we will create a Reddit’s Image Scraper tool for generating database of all scraped images which will be later used in a game. It is developed in HTML5 using Phaser framework and DebugOut script.
Here you can try the application to see what we are going to make. After you tap on the screen, the program will start to fetch data from the /r/pics subreddit and generate an output JSON file database.json at the end of the process!
The program works as a state machine fetching and processing data through the next states:
- At the start, the program is just waiting on mouse click in STATE_START.
- In the next step (STATE_LOAD_JSON) it loads an input JSON file from https://www.reddit.com/r/pics/top.json which contains a certain number of input data records defined in the LIMIT variable (line 22).
- When the input JSON file is loaded the program goes to STATE_LOAD_COMPLETE.
- The first input record with the url of the first image will be retrived in STATE_GET_IMAGE_URL.
- The validity of an image is checked in STATE_CHECK_IMAGE_URL.
- If an image is not valid the program will go to STATE_IMAGE_FAIL and try to fix its url but only if that link points to an Imgur page.
- If an image is valid the program will go to STATE_IMAGE_OK and write its data record (title, url, thumb and upvotes) to an output using DebugOut script.
- When an image is completely processed the program goes to STATE_IMAGE_PROCESSED and does the next:
- If all data records from the input JSON file are processed then:
- either go back to STATE_LOAD_JSON to load the next input JSON file
- or go to STATE_FINISH to save the output JSON file locally on disk and finish the program.
- Else go back to STATE_GET_IMAGE_URL to fetch the next image.
- If all data records from the input JSON file are processed then:
And here is the fully commented code of the scraper.js script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
/***********************************************************************************/ var game; // define all App states var STATE_START = 1; var STATE_LOAD_JSON = 2; var STATE_LOAD_COMPLETE = 3; var STATE_GET_IMAGE_URL = 4; var STATE_CHECK_IMAGE_URL = 5; var STATE_IMAGE_FAIL = 6; var STATE_IMAGE_OK = 7; var STATE_IMAGE_PROCESSED = 8; var STATE_FINISH = 9; var STATE_IDLE = 10; // reddit's url for scraping var REDDIT_URL = 'https://www.reddit.com/r/pics/'; // name of the output data file var OUTPUT_FILE = 'database.json'; var LIMIT = 5; // maximum number of items to return per a reddit's json input file var MIN_OUT_RECORDS = 5; // minimal number of output data records to be generated var imageValidity = 0; // image validity status = {0-image checking, 1-image error, 2-image is valid} /***********************************************************************************/ window.onload = function () { // Google Chrome and Firefox are blocking cross-domain image in WebGL (because of a security issue). // To bypass this use Phaser.CANVAS! game = new Phaser.Game(480, 800, Phaser.CANVAS, 'game'); game.state.add('Scraper', Scraper); game.state.start('Scraper'); }; /***********************************************************************************/ var Scraper = function(game){}; Scraper.prototype = { create : function(){ // set scale options game.scale.scaleMode = Phaser.ScaleManager.SHOW_ALL; game.scale.pageAlignVertically = true; game.scale.pageAlignHorizontally = true; game.scale.setScreenSize(true); // set stage and world options game.stage.backgroundColor = "#ecf0f1"; game.world.setBounds(0, 0, game.width, game.height); // create a text object for displaying event logs txtEventLog = game.add.text(10, 60, '\n', {font: "18px Arial", fill: "#222"}); // create a fixed top layer to display App title var grpTitleLayer = game.add.group(); grpTitleLayer.fixedToCamera = true; grpTitleLayer.cameraOffset.setTo(0, 0); var graphics = game.make.graphics(0, 0); graphics.beginFill(0x34495e); graphics.drawRect(0, 0, game.width, 80); var title = game.make.text( game.width/2, 40, 'Reddit\'s Image Scraper 1.0', {font: "30px Arial", fill: "#fff", align: "center"} ); title.anchor.setTo(0.5, 0.5); grpTitleLayer.add(graphics); grpTitleLayer.add(title); // create an output object for writing data in a json output file // see more about DebugOut script on https://github.com/inorganik/debugout.js output = new debugout(); updateEventLog("Tap to start scraping data from Reddit’s Json at:\n"+REDDIT_URL+"\n\n"); // set initial App state state = STATE_START; }, update : function(){ switch(state){ case STATE_START: // start scraping data when mouse is down if (game.input.mousePointer.isDown || game.input.pointer1.isDown){ updateEventLog("Data scraping started.\n"); output.log('{\"data\":'); output.log(' ['); counter = 0; // counter of currently generated output data records after =''; // after field used for loading the next reddit's json input file state = STATE_LOAD_JSON; } break; case STATE_LOAD_JSON: // load reddit's json input file updateEventLog(" Loading JSON input file...\n"); isInputFileLoaded = false; // flag to know if a reddit's json input file is loaded var loader = new Phaser.Loader(game); loader.json('input_file', getInputFileName()); loader.onLoadComplete.addOnce(onInputFileLoaded, this); loader.start(); state = STATE_LOAD_COMPLETE; break; case STATE_LOAD_COMPLETE: // wait on reddit's json input file to be loaded and then fetch data from it if (isInputFileLoaded){ items = game.cache.getJSON('input_file').data.children; after = game.cache.getJSON('input_file').data.after; index = 0; // set index to the first record in items array state = STATE_GET_IMAGE_URL; } break; case STATE_GET_IMAGE_URL: // get image url from the current record in items array and start checking its validity updateEventLog(" Checking image... "); checkImageUrl(items[index].data.url); state = STATE_CHECK_IMAGE_URL; break; case STATE_CHECK_IMAGE_URL: // wait on checking image validity to be completed if (imageValidity == 1) state = STATE_IMAGE_FAIL; else if (imageValidity == 2) state = STATE_IMAGE_OK; break; case STATE_IMAGE_FAIL: // given a link to a web page, try to get a direct link to the image // (currently supporting only imgur page links) var isImgur = false; var imgurPattern = /imgur\.com/; if (imgurPattern.test(items[index].data.url)) { // make sure it's not gifv (gifv won't display right) var gifvPattern = /.gifv$/; if(!gifvPattern.test(items[index].data.url)) { items[index].data.url = items[index].data.url + '.png'; isImgur = true; } } if (isImgur){ state = STATE_IMAGE_OK; } else { updateEventLog("Error!\n"); state = STATE_IMAGE_PROCESSED; } break; case STATE_IMAGE_OK: // write json output data for a valid image counter++; updateEventLog("OK > Writing data record "+counter+".\n"); if (counter > 1) output.log(' ,'); output.log(' {\"title\":\"' + items[index].data.title.replace(/"/g, '\\"') + '\",'); output.log(' \"url\":\"' + items[index].data.url + '\",'); output.log(' \"icon\":\"' + items[index].data.thumbnail + '\",'); output.log(' \"score\":' + items[index].data.score + '}'); state = STATE_IMAGE_PROCESSED; break; case STATE_IMAGE_PROCESSED: index++; // set index to the next record in items array if (index == items.length){ // if all records from items array are fetched // then either finish with scraping or load the next json input file if (counter > MIN_OUT_RECORDS) state = STATE_FINISH; else state = STATE_LOAD_JSON; } else { // else get the next image url state = STATE_GET_IMAGE_URL; } break; case STATE_FINISH: // finish scraping data updateEventLog("Data scraping finished.\n" + counter + " data records created.\n"); output.log(' ]'); output.log('}'); // generate the output json file output.downloadLog(OUTPUT_FILE); updateEventLog("Output file "+OUTPUT_FILE+" generated!\n"); state = STATE_IDLE; break; case STATE_IDLE: break; } } } // Updates event log with a new message. function updateEventLog(message){ txtEventLog.setText(txtEventLog.text + message); var newBoundWidth = txtEventLog.x + txtEventLog.width + txtEventLog.x; var newBoundHeight = txtEventLog.y + txtEventLog.height; game.world.setBounds( 0, 0, newBoundWidth > game.width ? newBoundWidth : game.width, newBoundHeight > game.height ? newBoundHeight : game.height ); game.camera.y = txtEventLog.height; } // Gets the name of a reddit's json input file. function getInputFileName(){ return REDDIT_URL + 'top.json?sort=top&t=all&limit=' + LIMIT + '&after='+after; } // Sets isInputFileLoaded flag to true when a reddit's json input file is loaded. function onInputFileLoaded(){ isInputFileLoaded = true; } // Checks if url points to an actual image. function checkImageUrl(url) { // here is some of many javascript regex for checking url validity var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/; if (urlPattern.test(url)){ // if url is valid then start checking if this is an actual image imageValidity = 0; // set image validity to checking status var img = new Image(); img.onerror = function(){ imageValidity = 1; }; // not an image img.onload = function(){ imageValidity = 2; }; // valid image img.src = url; } else { imageValidity = 1; // set image validity to error status because url is not valid } } |
Here is an example of the generated output JSON file with 5 records:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
{"data": [ {"title":"I've spent the past two years shooting drone aerials around the world. Here are 38 images which would be totally illegal today.", "url":"http://imgur.com/a/J9iOB.png", "icon":"http://b.thumbs.redditmedia.com/EJnhYQkBCyrCrLYqoXVSlRzBI2WhXxVBzpYHwx7UDko.jpg", "score":22524} , {"title":"Got divorced, lost my job, so me and my buddy got on our motorcycles and rode North to the Alaskan Arctic until the road ran out.", "url":"http://imgur.com/a/J7kZJ.png", "icon":"http://a.thumbs.redditmedia.com/5647fC_ALfU94wel5BdgM4fj4tl--3pxe2Yadxo1GT8.jpg", "score":16822} , {"title":"I sent Tom Hanks a 1934 Smith Corona typewriter with a typed invitation to come on my podcast. This was his response.", "url":"http://i.imgur.com/ppBPV.jpg", "icon":"http://f.thumbs.redditmedia.com/4pC-IVgF1KFnqWp5.jpg", "score":14563} , {"title":"My grandpa, my dad, and myself. Making three generations of wives worried sick.", "url":"http://i.imgur.com/Tgay4hL.jpg", "icon":"http://b.thumbs.redditmedia.com/B0p-xKV4Kd6tfANI.jpg", "score":12881} , {"title":"Garbagemen taking a break.", "url":"http://i.imgur.com/EWLWdxc.jpg", "icon":"http://b.thumbs.redditmedia.com/nBzQi4NxNoOZWXIS0JndRzVXd7uXUmbAylc2EKmMGAY.jpg", "score":12759} ] } |
In the next part we will see how to use this output JSON file to make a real game! So stay tuned!