|
||||||||||||||||
p r e v |
Links to Softwares
Some Java codes related to recent papers > In our latest CVPR paper, there is a mention about computing inversions. Here is the Java code on simultaneous stable sorting and computing inversions in N log(N) compares. Inversions count the number of exchanges required to transform a permutation of N natural numbers to their original ordering.> The Java code to extract color histogram features from an arbitrary video can be found here. The command to execute the code is java -cp VideoColorHistogram.jar edu.buffalo.cse.VideoProcessing .ColorHistogramFromVideo data/videos data/features/ColorHist 8 The code calls the ffmpeg utility in Linux to extract frames. If the number of bins per color channel is 8, then the total number of bins for the histogram is 8x8x8 = 512. The sample data directory consisting of just one example test video is here. > Java code to perform multithreaded N-Way merge. The easiest way to time the code on arrays with random doubles in [0,10) is time java -cp NWayMerge.jar NWayMerge -t 100 -l 1000000 A better way is to use the code as needed. The test client can be found in the NWayMerge.java file. The general usage is: java -cp NWayMerge.jar NWayMwerge -t <numThreads> -l <numItemsToSort> -i <inputFile> -o <outputFile> [--verbose] Type java -cp NWayMerge.jar NWayMwerge -h for usage. The AToM Framework I had worked on the AToM (Another Topic Model) framework that implements a framework for statistical topic modeling codes in C++. [ Gibbs-LDA download]Note: This version implements only a basic Gibbs sampler type LDA (Latent Dirichlet Allocation). The softwares here are intended as quick and dirty prototypes for beginners. Some additional comments:
|
Prototypes - require polishing
Topic Model Codes
> Public code repositories are being shifted to GitHub. > The CorrLDA and MMLDA versions in Github contain the prediction code --- finding the most probable training observations from one modality given the other modality. For example, predict the top M words given a test image or video features. Tag-Topic Model Codes from 2011 > Some C++ prototype codes from our 2011 CIKM paper - "Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives"
- Download TagLDA in C++
The model names may be different
in code than those in the paper. Will do some refactoring if time
permits.- Download MMLDA in C++ - Download CorrMMLDA in C++ - Download METag2LDA in C++ - Download CorrMETag2LDA in C++
> Download Variational
Bayesian LDA in C++ with asymmetric dirichlet prior (over document level topic proportions only). The code uses
materials from "Estimating Parameters for Dirichlet distribution" by Tom Minka.
For more information on effect of Dirichlet priors in LDA, the paper titled "Rethinking LDA: Why Priors Matter?" serves as a good reference.
Some Topic Model Codes from Earlier Times
- Download Variational Bayesian
LDA in C++
- Download Gibbs sampling base LDA
in C++
Notes on Topic Model Codes
NOTE: All ids for
documents and observed data (terms, document level tags and word level
tags) start from 0 and
are sequential with an increment of 1. The comments in the source files can be misleading due to copy-pasting. These might be fixed if time and motivation permits.
This is the Variational Bayesian version of LDA
(David Blei's LDA) written in C++. The code is written using
the EM (Expectation Maximization) framework that can be integrated with
AToM with some effort (can be used as a standalone module for now)
|
TA Corner TA1. Adhoc Datastructures [ This Eclipse CDT
project] serves as a repository for standard algorithms that
are not found in the standard C++ stl (except heaps). Source codes have
been borrowed from several sources. Being written entirely in C++, this
package currently includes implementation for Multi-Way-Merge or
K-Way-Merge, B+ Tree, a minimal Trie, Heaps using vectors and a minimal
on-disk binary search (often used for inverted index searching based on
query words). WishList: include code from
Google's sparse hash, SGI STL hash_map and TPIE
See the readme file inside the tarball for compilation instructions.
The SGI STL hash_map can be accessed using standard c++ library in most
nix systems and can be used in code as:
#include<ext/hash_map>
using namespace __gnu_cxx; TA2. CSE 4/535 IR course - Fall 2010 [ Warmup code]
This is an adhoc implementatation for the first project. The expected
behavior of this code is to extract internal wiki links from wiki
markups. For more info on how the markup files look like, please see
the files under the "Wiki" subdirectory under the data directory.
(+) First project. (+) Second project. (+) Third project. TA3. CSE 4/535 IR course - Fall 2009 [ Warmup notes on C++ STL]
The content in this pdf was geared towards use in first project. The focus of the project
was on document language identification using character bi-grams. Stl
bitset was used for unicode extraction. Format for unicode is well
described in the UTF-8
article in Wikipedia
(*) Find the UTF-8 code chart here |
n e x t |
||||||||||||
Please report bugs to me. My email can be found at the bottom
right corner of this page with "university domain" being substituted
with "buffalo . edu". Thanks!
|
||||||||||||||||
|