1. The wordcloud package for R is great, but all the examples I found used the tm package to process a large amount of textual data (web pages, text files, google docs, etc.)

    But what if you have normalized data where you have a word and its frequency? Or, what if you have phrases that you want in a wordcloud? One example being terms which users have entered into a web search.

    I happen to be pulling from a data source via PHP and then I output the data to CSV format in descending order by frequency.

    The relevant part of the PHP script (after populating the array $terms):

    $cwd = getcwd();
    $local_path = $cwd.'/csv/';
    $filename = $local_path.'searchterms.csv';
    $fp = fopen($filename, 'w');
    fputcsv($fp, array('term','freq'));
    arsort($terms); //reverse sort array by values
    $max_terms = 100;
    $i = 0;
    foreach ($terms as $q => $v) {
        $i++;
        if ($v > $min_freq) fputcsv($fp, array($q,$v));
        if ($i > $max_terms) break;
    }
    fclose($fp);

    Here is the sample data:

    term,freq
    "target black friday",8239
    "walmart layaway",6502
    "america idol",1777
    "american idol episodes",1741
    "mexican train domino game",1585
    "jc penny outlet store",1159
    "the chicago code",1130
    ...

    The R script:

    require(wordcloud)
    require(RColorBrewer)
    datain <- read.csv("csv/searchterms.csv", colClasses=c("character", "numeric"))
    pal2 <- brewer.pal(8,"Dark2")
    png("wordcloud.png", width=1000,height=1000)
    wordcloud(datain$term,datain$freq, scale=c(8,.4),min.freq=1, max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
    dev.off()

    One consideration is that if a search phrase is too long, R will produce a warning and omit it from the resulting wordcloud, so you need to compensate with the image dimensions. It may be possible to dynamically scale the image based on the string length of the highest frequency result.

    Here is the resulting wordcloud:

    For more on R, visit http://www.r-bloggers.com/
    0

    Add a comment

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.