24 January 2010
A Question for you
A question for you: do you feel that you give off any impressions to people around you? What have you noticed about men, and especially women you know well? What 'aura' do they give off?
04 January 2010
Exploring the structure of digital audio
A audio source can be thought of as a sinusoidal function. Suppose we have a speaker (audio monitor) whose position in terms of time is given by f(t)=cos(440*2PI t). This speaker will produce sound that our ears pick up as monotone at 440Hz. However such a function is difficult to find or fit for arbitrary data, eg the sound recorded by a mic, so the way digital audio sources are thought of is Fast Fourier Transforms (FFTs) of the function f(t). The FFT operator transforms f(t) into a function F:time*frequency->intensity. Theoretically the time and frequency are continuous values but in practice, and in digital audio these are discretized.
Since the FFT of a sound function produces a 2D function, we can play with something that closely resembles 2D images. Right now my inspiration for transformations come from visual filters. I'm curious about what applying a Gaussian Blur or a Sobel Edge filter on a audio sample would produce. Further, can we take derivatives in this 2D space that can be used to determine similar features between two songs (a. la. SIFT in graphics)? Can we use k-means to learn the characteristics of a set of audio samples, perhaps to perform voice recognition? These are some more and less ambitious ideas I have that all stem from thinking about audio samples visually. What would it be like to play greyscale images over speakers? Can this be used to guide people whose vision is impaired?
I intend to further explore FFT's to see how exactly these ideas be done. Two good sources I found are:
(1)HMS and
(2)this other one.
Further I have been trying to decide on a platform to work in. Python, Ruby, SML, C, and Java come to mind, but as of right now I have decided to try out ObjC and the CoreAudio SDK. Since I have never programmed anything in ObjC, right now this project is at the stage of me learning ObjC. Here I can refer you to:
(1) a decent tutorial by Apple
(2) Programming in Objective C by Stephan Kochan
(3) Stack Overflow
Since the FFT of a sound function produces a 2D function, we can play with something that closely resembles 2D images. Right now my inspiration for transformations come from visual filters. I'm curious about what applying a Gaussian Blur or a Sobel Edge filter on a audio sample would produce. Further, can we take derivatives in this 2D space that can be used to determine similar features between two songs (a. la. SIFT in graphics)? Can we use k-means to learn the characteristics of a set of audio samples, perhaps to perform voice recognition? These are some more and less ambitious ideas I have that all stem from thinking about audio samples visually. What would it be like to play greyscale images over speakers? Can this be used to guide people whose vision is impaired?
I intend to further explore FFT's to see how exactly these ideas be done. Two good sources I found are:
(1)HMS and
(2)this other one.
Further I have been trying to decide on a platform to work in. Python, Ruby, SML, C, and Java come to mind, but as of right now I have decided to try out ObjC and the CoreAudio SDK. Since I have never programmed anything in ObjC, right now this project is at the stage of me learning ObjC. Here I can refer you to:
(1) a decent tutorial by Apple
(2) Programming in Objective C by Stephan Kochan
(3) Stack Overflow
Subscribe to:
Posts (Atom)