<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>visualization on irq5 test</title><link>https://irq5-7854a1fdb9f4.pages.dev/tag/visualization/</link><description>Recent content in visualization on irq5 test</description><language>en-us</language><lastBuildDate>Fri, 26 Dec 2014 01:18:00 +0000</lastBuildDate><atom:link href="https://irq5-7854a1fdb9f4.pages.dev/tag/visualization/feed/" rel="self" type="application/rss+xml"/><item><title>Visualizing Binary Features with matplotlib</title><link>https://irq5-7854a1fdb9f4.pages.dev/2014/12/visualizing-binary-features-with-matplotlib/</link><pubDate>Fri, 26 Dec 2014 01:18:00 +0000</pubDate><guid>https://irq5-7854a1fdb9f4.pages.dev/2014/12/visualizing-binary-features-with-matplotlib/</guid><description>&lt;p>Some time ago, I started playing around with data analysis and machine learning.
One of the more popular tools for such tasks is &lt;em>IPython Notebook&lt;/em>,
a browser-based interactive REPL shell based on &lt;a href=http://ipython.org/ rel=noopener target=_blank class=external>IPython&lt;/a>.
Each session becomes a &amp;ldquo;notebook&amp;rdquo; that records the entire REPL session with both inputs and (cached) outputs, which can be saved and reviewed at a later time, or exported into another format like HTML.
This capability, combined with &lt;a href=http://matplotlib.org/ rel=noopener target=_blank class=external>matplotlib&lt;/a> for plotting and &lt;a href=http://pandas.pydata.org/ rel=noopener target=_blank class=external>pandas&lt;/a> for slicing and dicing data makes this a handy tool for analyzing and visualizing data.
To give you an idea of how useful this tool can be, take a look at some example notebooks using &lt;a href=http://nbviewer.ipython.org/ rel=noopener target=_blank class=external>the online notebook viewer&lt;/a>.&lt;/p>&lt;p>In this quick post, I&amp;rsquo;ll describe how I visualize binary features (present/not present) and clustering of such data.
I am assuming that you already have experience with all of the above-mentioned libraries.
For this example, I&amp;rsquo;ve extracted permissions (&lt;code>uses-permission&lt;/code>) and features (&lt;code>uses-feature&lt;/code>) used by a set of Android apps using &lt;a href=https://code.google.com/p/androguard/ rel=noopener target=_blank class=external>Androguard&lt;/a>. The resulting visualization looks like this:&lt;/p>&lt;p>&lt;picture>&lt;source srcset=/posts/2014/img/apps-binary-features-viz.png.webp type=image/webp>&lt;img src=https://irq5-7854a1fdb9f4.pages.dev/posts/2014/img/apps-binary-features-viz.png alt="visualization of binary features" width=630 height=386>&lt;/picture>&lt;/p>&lt;p>Each row represents one app and each column represents one feature.
More specifically, each column represent whether a permission or feature is used by the app.
Such a visualization makes it easy to see patterns, such as which permission or feature is more frequently used by apps (shown as downward streaks), or whether an app uses more or less features compared to other apps (which shows up as horizontal streaks).&lt;/p>&lt;p>While this may look relatively trivial, when the number of samples increase to thousands of apps, it becomes difficult to make sense of all the rows &amp; columns in the data table by staring at it.&lt;/p>&lt;p>&lt;a href="https://irq5-7854a1fdb9f4.pages.dev/2014/12/visualizing-binary-features-with-matplotlib/#more">Continue reading…&lt;/a>&lt;/p></description></item></channel></rss>