<?xml version="1.0" encoding="UTF-8" standalone="yes"?><oembed><version><![CDATA[1.0]]></version><provider_name><![CDATA[Gigaom]]></provider_name><provider_url><![CDATA[http://gigaom.com]]></provider_url><author_name><![CDATA[Derrick Harris]]></author_name><author_url><![CDATA[http://search.gigaom.com/author/dharrisstructure/]]></author_url><title><![CDATA[Google open sources a MapReduce framework for C/C++]]></title><type><![CDATA[link]]></type><html><![CDATA[<p>Google announced on Wednesday that the company is open sourcing a MapReduce framework that <a href="http://google-opensource.blogspot.com/2015/02/mapreduce-for-c-run-native-code-in.html">will let users run native C and C++ code in their Hadoop environments</a>. Depending on how much traction <a href="https://github.com/google/mr4c">MapReduce for C, or MR4C</a>, gets and by whom, it could turn out to be a pretty big deal.</p>
<p>Hadoop is famously, or infamously, <a href="http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201010.mbox/%3C823278.93647.qm@web39702.mail.mud.yahoo.com%3E">written in Java</a> and as such can suffer from performance issues compared with native C++ code. That&#8217;s why Google&#8217;s original MapReduce system was written in C++, as is the Quantcast File System, that company&#8217;s <a href="https://gigaom.com/2012/09/27/quantcast-releases-bigger-faster-stronger-hadoop-file-system/">homegrown alternative for the Hadoop Distributed File System</a>. And, as the blog post announcing MR4C notes, &#8220;many software companies that deal with large datasets have built proprietary systems to execute native code in MapReduce frameworks.&#8221;</p>
<p>This is the same sort of rationale behind <a href="https://gigaom.com/2011/12/09/facebook-speeds-php-development-with-hiphop-vm/">Facebook&#8217;s HipHop efforts</a> and database startup MemSQL, whose system <a href="https://gigaom.com/2012/06/18/ex-facebookers-launch-memsql-to-make-your-database-fly/">converts SQL to C++</a> before executing it.</p>
<p><img  src="https://gigaom2.files.wordpress.com/2015/02/mr4c.png?w=804" alt="MR4C"   data-attribution="Google" class="aligncenter size-full wp-image-915620" /></p>
<p>MR4C was developed by satellite imagery company Skybox Imaging, which <a href="http://google-opensource.blogspot.com/2015/02/mapreduce-for-c-run-native-code-in.html">Google acquired last June</a>, and was optimized for geospatial data and computer vision code libraries. Of course, open sourcing MR4c presents the opportunity to open up this capability to a broader range of users, either working in fields dominated by C libraries or those who just don&#8217;t like or aren&#8217;t comfortable writing programs in Java. When Google <a href="https://gigaom.com/2014/06/11/why-google-is-sowing-the-seeds-of-container-based-computing/">announced its open-source Kubernetes</a> container-management system last year, it was quickly ported from Google Compute Engine <a href="https://gigaom.com/2014/07/10/with-microsoft-ibm-and-red-hat-backing-it-googles-kubernetes-is-a-peace-pipe-and-trojan-horse/">to run in several other environments</a>.</p>
<p>It will be interesting to see how much traction MR4C gets at this point, especially given <a href="https://gigaom.com/2014/06/28/4-reasons-why-spark-could-jolt-hadoop-into-hyperdrive/">the surge in interest around Apache Spark</a>. Spark is a faster data-processing framework than MapReduce, already has a lot of interest, and natively supports Scala, Python and Java, although it does not support C/C++.</p>
<p>The future of Hadoop and big data processing will certainly be a big topic of conversation at our <a href="https://events.gigaom.com/structuredata-2015/">Structure Data</a> conference next month in New York, which features Google VP of infrastructure Eric Brewer, Spark co-creator (and Databricks CEO) Ion Stoica and the CEOs of all three major Hadoop vendors.</p>
]]></html><thumbnail_url><![CDATA[https://i1.wp.com/gigaom2.files.wordpress.com/2015/01/data-binary-code.jpg?fit=440%2C330&quality=80&strip=all]]></thumbnail_url><thumbnail_height><![CDATA[293]]></thumbnail_height><thumbnail_width><![CDATA[440]]></thumbnail_width></oembed>