But what if you have normalized data where you have a word and its frequency? Or, what if you have phrases that you want in a wordcloud? One example being terms which users have entered into a web search.
I happen to be pulling from a data source via PHP and then I output the data to CSV format in descending order by frequency.
The relevant part of the PHP script (after populating the array $terms):
$cwd = getcwd();$local_path = $cwd.'/csv/';
$filename = $local_path.'searchterms.csv';
$fp = fopen($filename, 'w');
fputcsv($fp, array('term','freq'));
arsort($terms); //reverse sort array by values
$max_terms = 100;
$i = 0;
foreach ($terms as $q => $v) {
$i++;
if ($v > $min_freq) fputcsv($fp, array($q,$v));
if ($i > $max_terms) break;
}
fclose($fp);
Here is the sample data:
term,freq
"target black friday",8239
"walmart layaway",6502
"america idol",1777
"american idol episodes",1741
"mexican train domino game",1585
"jc penny outlet store",1159
"the chicago code",1130
...
The R script:
require(wordcloud)
require(RColorBrewer)
datain <- read.csv("csv/searchterms.csv", colClasses=c("character", "numeric"))
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud.png", width=1000,height=1000)
wordcloud(datain$term,datain$freq, scale=c(8,.4),min.freq=1, max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()
One consideration is that if a search phrase is too long, R will produce a warning and omit it from the resulting wordcloud, so you need to compensate with the image dimensions. It may be possible to dynamically scale the image based on the string length of the highest frequency result.
Add a comment