Snap! to WMR

This project will create a '''Snap! interface to perform WMR map-reduce''' computing jobs.

=Project Overview=
 * Snap! to WMR spec
 * Snap! to WMR plans
 * Snap! to WMR protocol

Brief Summary:
" On Thu, 21 Jun 2012 14:28:06 -0700, "Dan Garcia"  said: Snap! is our outstanding in-the-browser visual programming language inspired and strongly influenced by scratch.

WebMapReduce is Dick Brown's great site to put mapreduce in the hands of the novice users. http://webmapreduce.sourceforge.net/

'''I want to modify snap! so that you can call WebMapReduce from Snap! but with a simpler interface --'''

[in Snap!] 1) Map [ map-function ] then Reduce with [reduce-function] over [ list ]

So to do sum of squares of 1,2,10 (1^2 + 2^2 + 10^2 = 105), I would call Map [ square(_) ] then Reduce with [ (_)+(_) ] over [ (1 2 10) ] ==> 105

One line!!

and more general with interesting data:

2) Map [ map-function ] then Reduce with [reduce-function] over URL [ URL ] ...where URL points to a web file and would have the following on every line

 1 2 10 

Map [ square(_) ] then Reduce with [ (_)+(_) ] over URL [ http://me.com/num.txt ]

allows more general input (from a web file)

dan

p.s. Dick says this is 'easy' from his end. "

Dick's plan ideas:

 * 1) Put up a wiki for development.
 * 2) Sort out the spec of what we're building.
 * 3) Figure out how to write Snap! scripts, for turning Snap! programs into Linux executables.
 * 4) Add a single-thread implementation of map-reduce computations to Snap!, implemented locally in Javascript, corresponding to the WMR Test option.
 * 5) Implement and test the protocol for communicating from Snap!/Javascript to WMR/Django.
 * 6) Implement a block (or variant of prior blocks) for map-reduce that uses Hadoop via WMR.
 * 7) Cleanup -- finish documentation, polish code, etc.

Giovanni's earlier list:

 * 1) Design document?
 * 2) Error handling?
 * 3) 1st coding effort?
 * 4) 1st test data?
 * 5) Documentation???

=People= Dick Brown.

Jens Mönig: I’m a former Smalltalk/V programmer turned lawyer experimenting with the Scratch Source Code from MIT http://www.chirp.scratchr.org/blog/

Brian Harvey (easily one of the world’s best programming teachers): http://www.cs.berkeley.edu/~bh/ http://www.eecs.berkeley.edu/Faculty/Homepages/harvey.html

Dan Garcia http://www.cs.berkeley.edu/~ddgarcia/ http://www.eecs.berkeley.edu/Faculty/Homepages/garcia.html

Giovanni http://sites.google.com/site/berkeleytip/ http://wiki.debian.org/Smartphone

=Software & Hardware=

Software
'''SNAP! 4.0''' Progress http://snap.berkeley.edu/ http://byob.eecs.berkeley.edu/snapsource/ - 4? http://byob.berkeley.edu/run/

Javascript

Canvas

WMR.

BYOB 3.1 — Build Your Own Blocks (a/k/a SNAP!) http://byob.berkeley.edu/

New Reference Manual with lots of examples and tutorials by Brian http://byob.berkeley.edu/BYOBManual.pdf http://scratch.mit.edu/galleries/view/79892

Scratch http://scratch.mit.edu/

UCB Scheme is a modified version of STk 4.0.1 by Erick Gallesio. http://inst.eecs.berkeley.edu/~scheme/

STk is a free R4RS Scheme interpreter which can access the  Tk graphical package. http://kaolin.unice.fr/STk/

Hadoop

Smalltalk.

Hardware
St. Olaf Beowolf

St.Olaf Hadoop Cluster

UCB Hadoop Cluster

=Communication= Mailing List

http://groups.google.com/a/stolaf.edu/group/wmr-snap Email Dick Brown to request being added to the mail list.

http://www.gnu.org/software/mailman/index.html St. Olaf maybe could run their own lists w/ mailman.

VOIP

Skype

GoToMeeting.com

=Detailed Plans=

Re: Goals for next 10 days = 60 hrs?? - by Dan Garcia
On Tue, 26 Jun 2012 04:19:25 -0700, "Dan Garcia"  said:

Here's what I would say would be the best use of your 60 hours. Your job is to connect two wonderful worlds. Dick's WebMapReduce one, and our Snap! one.

Take a look at the 2-hour lab exercise we do with our CS10 class Beauty and Joy of Computing: http://sage.cs.berkeley.edu/course/view.php?id=24 (scroll down to lab 20 -- distributed computing)

We basically have them implement a fully functional-style MapReduce. The top-level block takes in a Mapper (monadic function), Reducer (dyadic function) and list. It maps the function over the list, then calls the reducer which reduces the list to a single return value. It's much much simpler than full raw MapReduce in Java/Python/etc. It's quite elegant, purely functional-style, but all the computation occurs on one machine.

Dick has the ability to support many languages that run on his cluster. I'd like to connect the two, so you're sending the computation to his cluster. To make things distinct, I'd suggest writing a block

WebMapReduce with mapper _mapper-function_ and reducer _reducer-function_ over _list_

(in my original email I talked about using files -- do that after this first step works)

Jens is the Snap! expert. Dick is the WebMapReduce expert. I couldn't be the technical lead on either; I'm more of the 'vision' guy, designing the Spec.

When we did this with scheme we had to export the entire scheme environment over so that any global variables were available to the Mapper and Reducer. It was exported to another interpreter of scheme written in java (allowing us to call the hadoop underlying code -- see the paper for more details)

We provided the full MapReduce, as well as the simpler one I describe; the important thing is the simplification happened on *our* end -- it was basically an abstraction over the raw functionality. I'd be happy with that model here too.

Here are illustrations that show what I mean --

Original MapReduce

http://csillustrated.berkeley.edu/PDFs/handouts/mapreduce-handout.pdf http://csillustrated.berkeley.edu/PDFs/handouts/mapreduce-code-handout.pdf http://csillustrated.berkeley.edu/PDFs/handouts/mapreduce-example-handout.pdf

Simpler MapReduce

http://csillustrated.berkeley.edu/PDFs/handouts/mapreduce-simpler-handout.pdf http://csillustrated.berkeley.edu/PDFs/handouts/mapreduce-diffs-simpler-handout.pdf http://csillustrated.berkeley.edu/PDFs/handouts/mapreduce-reduction-simpler-handout.pdf http://csillustrated.berkeley.edu/PDFs/handouts/mapreduce-example-simpler-handout.pdf

I think Dick would have the best idea of how to proceed next; have a conversation with him and his team and see if you can round in Jens and that will be a good meeting.

=Q & A=

Re: 2. Sort out the spec of what we're building - unfunished thread
(put mail list therad here wnen done ...On Sat, 7 Jul 2012 07:15:20 -0500, "Dick Brown"  said:)

Brian,

I was assuming that a Snap! programmer might potentially want to use map-reduce without building it him/herself first, which comes from a prior assumption that Snap! would someday be used outside of the current course. Now I understand that a map-reduce block won't be useful for students in the current course, so please ignore my "It's useful for students" comment.

But it still seems to me that Giovanni could take on this little exercise in order to figure out for himself how to build a Snap! block and to get feedback on the geometric design of that block, while my team and I sort out what WMR should ask from Snap!, in terms of communication. (We have to either add a new communication interface to WMR or modify the current WMR communication strategy, because WMR talks to its current simple web page in a way that won't work for talking to Snap!, a Javascript program that presumably has other things to do besides blocking until Hadoop gets done.)

If Giovanni already knows how to implement Snap! blocks and if the geometric design of the new block is set, then maybe there's nothing for him to do right now on the implementation.

Dick

On Fri, Jul 6, 2012 at 10:30 PM, Brian Harvey  wrote:

> On Jul 6, 2012, at 10:38 AM, Dick Brown wrote:

> > However, I still think a *Local Test version* is a good idea: > >

> >  - It provides a worthy initial goal for development on the Snap! side. > >   It's essentially a prototype that has UI, data handling, etc., but > without > >  having to deal with networking to WMR. > >  - It's feasible, I assume, since the students build Map-Reduce in Snap! > >  in one of their labs (#20). This would just be implementing inside of > Snap! > >  instead of "outside." > >  - It's useful for students, because Hadoop gives lousy diagnostics > >  (e.g., between mapper and reducer), and because the within-Snap! > >  implementation will perform much better on the small data sets most of > them > >  will use, compared with Hadoop, which is tuned for terabytes.

>

> I think we already have this, except that it's written /in/ Snap! instead > of as > a primitive. Even plain old MAP and COMBINE (what we call reduce) aren't > primitives; they're written in terms of cons/car/cdr. (This has the > advantage that > you get to choose your favorite COMBINE behavior for lists of length < 2.) > It's the networking to WMR that we need developed! >

> Am I missing something?

Re: Creating Linux executables from Snap! code
"On Sun, 1 Jul 2012 10:42:47 +0200, "Jens Mönig"  said: Hi, Dick.

I'm not whether I understand what the script does, which you sent me, but I'll try to answer your questions to the best of my Snap! knowledge:

1) Snap is all plain JavaScript and Canvas, it's not dependent on a particular JS VM, you should be able to use any standards-compliant one (Rhino, V8...)

2. I believe that JavaScript absolutely is the right base language for any server-side work. Brian is right in pointing out, that our Snap! evaluator borrows liberally from Scheme, but it also borrows heavily from Smalltalk, takes some aspects from Scratch, while inventing a few gimmicks of its own, so it is not at all a pure Scheme language. Also - which is perhaps the critical information here - Snap currently does not (and AFAIK probably will not ever) have the capability to export scripts to Scheme.

3. Yup, this sounds like the right approach to me. You probably want to import most of the Snap interpreter, which is primarily to be found in the file "threads.js", and also some of the relevant data structures for blocks and their parts ("blocks.js") and for the micro world ("objects.js"). Then you probably want to take out or disable the multi-threading stuff, because you'll be dispatching jobs at some other, earlier level, right? But this should only be necessary if you try to come up with a *general* mechanism for running Snap in a cluster. If it's just certain operations you want to parallelize, then you can take a much more narrowed-down approach, which would be much easier to implement in all likelihood. In this case we could simply agree on a public interface on your side and add a "Web-Map-Reduce" primitive to our side of Snap. Maybe if I could understand better what it is exactly that you're after I could give you more precise advice.

4. Ah, same issue here. As I just pointed out, it depends how generic you want this to be...

Is this of any help? -Jens

Am 29.06.2012 um 23:40 schrieb Dick Brown:

> Jens, > > As I mentioned somewhere, we extend WebMapReduce (WMR) to support "mapper" and "reducer" functions written in a particular programming language by wrapping those functions in Linux executables, > one of which applies the "mapper" to each line of its standard input and putting the results on its standard output, and > the other which applies the "reducer" to each line of its standard input (obtained from mapper-executable's output) and delivers the final result on its standard output. > For an interpreted language, we typically implement the mapper-executable and reducer-executable as Linux scripts in that language. I've attached an example Linux script by Stephen (thanks!) for the Javascript language, implemented for the Rhino Javascript interpreter. (Note: this is not a good example of a mapper-executable, but only shows how to execute Javascript, using stdin and stdout.)

> > Our questions:

> We used the Rhino interpreter for this example. Does the Snap! development environment use a particular Javascript interpreter, or will any interpreter do? > Is Javascript the right base language? Brian's comment suggests that it may be possible to export Snap! code as Scheme instead of Javascript. We can make Scheme Linux scripts (or some other language) just as easily as Javascript Linux scripts, if there is some advantage for one language over another.

> We're assuming that we can make Snap! Linux scripts by importing some subset of the Snap! distribution into a Javascript Linux script (or whatever the right base language may be). Do you have any advice or warnings about this approach?

> As Dan pointed out, Berkeley folks implemented a Scheme-to-Hadoop interface a few years ago for the CS3 course (this actually motivated WMR). They also needed to pass the Scheme environment to Hadoop, since mappers or reducers might use global bindings. Will we likewise have to pass the Snap! environment (or at least closures for the mapper and reducer functions) to WMR? If so, can you suggest how to do that?

> Thanks for any help with these (from you or others). > > Dick "

SNAP interpreter is integrated into the development environment, written in JavaScript
On Sun, 24 Jun 2012 12:07:32 +0200, "Jens Mönig"  said:

"The current SNAP interpreter is integrated into the development environment, and everything is written in JavaScript. You can always download the whole current sources from

http://snap.berkeley.edu/snapsource/snap.zip

In there you'll find a file named

threads.js

which is where the bulk of the SNAP evaluator is in (it's mostly inspired by Smalltalk and Scheme). The only missing part is the scheduler, which is part of the Morphic framework and handled by the Stage object (in the file objects.js).

I think it's probably best to make yourself familiar with these sources, and then to start thinking about how we can communicate and spread the bits and pieces of the interpreter (blocks and contexts) among a distributed environment.

Again, if anything is unclear in the sources, please just ask, and I'll try to answer!

-Jens

Am 23.06.2012 um 21:22 schrieb Dick Brown:

> Jens, > > Good! >

> For getting my students started, is there a particular version of a SNAP interpreter should use? WMR works by wrapping some code around submitted mapper (or reducer) code in order to create an executable from it, then remotely launching a Hadoop job on a cluster using those two executables via Hadoop's "streaming" interface. For an interpreted language, the executables are typically script files for that language. >

> If there is some sort of back-end language processor for SNAP (ideally for Linux), we can start getting familiar with SNAP and working towards a WMR plugin to produce executables from submitted mapper and reducer function definitions automatically. "

invent streams (lazy lists) for Snap!?
"On Fri, 22 Jun 2012 00:15:58 -0700, "Dan Garcia"  said: not for CS10. We never did it with CS3 when we did mapreduce. The hope is that the returned value is small...

dan

On Thu, Jun 21, 2012 at 06:59:34PM -0700, Brian Harvey wrote: > If you want to use mapreduce on "real data," do you plan to invent > streams (lazy lists) for Snap!? It's not that hard to do, since we have > special forms. "

=Project Startup History= On Sat, 23 Jun 2012 07:51:27 -0500, "Dick Brown"  said: "John/Giovanni,

Greetings! I wanted to confirm that we are excited about adding a WebMapReduce (WMR) interface for use with SNAP''' over the Internet.

FYI, we are in the process of putting together a new release that is running on our systems and which we should be putting up on sourceforge soon, so we'll be adding the SNAP interface to that new version. We'll start sending you (and Jens?) questions early next week.

Glad to be getting started!

Dick"

On Sat, 23 Jun 2012 15:16:17 +0200, "Jens Mönig"  said: "Hi, all,

'''this sounds like a /very/ exciting project. Please keep me posted''' and feel free to ask me anything you need or want to know about SNAP!

Cheers, -Jens"

=Miscellaneous=

To Be Filed Links:
CS 10. The Beauty and Joy of Computing http://www.eecs.berkeley.edu/Courses/Data/855.html

CS10 : The Beauty and Joy of Computing Summer 2012 http://inst.eecs.berkeley.edu/~cs10/su12/

CS61A: Structure and Interpretation of Computer Programs http://inst.eecs.berkeley.edu/~cs61a/sp11/

AP CS Principles http://www.csprinciples.org/

The Beauty and Joy of Computing http://bjc.berkeley.edu/

http://meta.wikimedia.org/wiki/Help:Editing

Comments
2012.06.30 1:57 AM> Giovanni. I'm unable to enter an Enter/newline character. That key is grayed out on my android tablet. ... I'm also unable to paste a url here. 206am Cuuriously, I'm able to padste text, & the Enter key is lighted ok on the "My watchlist" edit page. 120701 G> I'm unable to paste anything here w/ the Lenovo tablet, though I seem to be able to copy. 2012.07.02 8pm Dick> Giovanni, I'm sorry your tablet and our wiki aren't talking well to each other. Glad cutting and pasting are making a workaround. We may not be able to schedule an upgrade of this MediaWiki in time to serve the main Snap!-WMR development... Should the project look elsewhere for a site? 120707: G: Dick - No problem looking elsewhere for a site, this is fine. I'm just noting these problems here for possible reference confirmation by others. :) 120706: G: Wiki editing problems on the CS10 lab Macs under Firefox: 1) paste url, then type an "enter", the newline goes as the 1st character at the top of the page, ie not following the url just pasted. Also, there seem to be spurious newlines after the page is saved, which weren't showing up during the preview??? 2) This doesn't seem to be hapening yet as I start editing on the Ubuntu desktops in the other lab.