Wednesday, August 28, 2013

Activity 12 - Playing Notes by Image Processing

The activity aims to be able to play a musical scoresheet using image processing. 

With the built-in function of scilab to produce sound, the only problem at hand is how we will be reading the scoresheet. We will be using all our skills in image processing to be able to identify notes and and we will be able to listen to it.

First lets crop the staffs so that the clefs are removed and only the staff and the notes are left. Actually we can also remove the clefs by morphological operations, but you can do this for convenience. Also later, I want you to recognize what song did I used to finish this activity. 

If you still dont know a lot about morphological operations, you better read my previous blogs regarding this. I wont go in to detail on the cleaning and enhancement of the image of the scoresheet.

So first lets try this sample scoresheet. The first thing to do is to eliminate all unnecessary symbols so that we can be left by blobs of notes. So using erosion which is implemented in the following code and images. I obtained the following results.

reading the scoresheet
clear
a = imread('C:\Users\Jazz\Desktop\Notes\1st staf.png');
imshow(a)



Figure 1. Sample staff , cropped from a bigger and complete scoresheet.

im = im2bw(~a,0.6);
imshow(im);
SEvline = CreateStructureElement('vertical_line', 2);
SEhline = CreateStructureElement('horizontal_line', 3);
SEcircle = CreateStructureElement('circle',5);
b = ErodeImage(im,SEvline);
imshow(b);
c = ErodeImage(b,SEhline);
imshow(c);
imwrite(c,'C:\Users\Jazz\Desktop\Notes\notes.png')

Figure 2. Cleaned image of the sample staff.

Figure 2 shows the image that already can be subjected to analysis. We can see that with these simplification we can still identify the kind of note it is, and its height related to its pixel coordinates.

Consequently, if we apply the SearchBlob function, which is thoroughly discussed in the previous blog (Activity 11), I can detect 9 blobs instead of 8. As you can see, obviously, this error comes from the half note (6th blob from left), which can be detected to be 2 blobs.

To eliminate this error we again use morphological operations that will enhance the image (especially the half note). So I implemented the following code utilizing the CloseImage function. Again if you are at lost of the morphological operations, please see my previous blogs. (Activity 10, Morphological Operations and Activity 11 Binary Operations)

d = CloseImage(c,SEcircle);
imshow(d);
d = imwrite(d,'C:\Users\Jazz\Desktop\Notes\distinctnotes.png')

Figure 3 Enhanced image of Figure 2

Right now, we already have a clear 8 blobs for our image while preserving the difference of the eight note and the half note.

Since we know how to call each blobs and we can differentiate its area, we can easily identify the corresponding note equivalent of each blobs. So obviously, we have 7 eight notes and 1 half note(6th blob).

 //AnalyzeBlobs
IsCalculated = CreateFeatureStruct(%f);
IsCalculated.Centroid = %t

BlobStatistics = AnalyzeBlobs(Blobs,IsCalculated);

Area = size(find(Blobs==6),2);
disp(Area);

xpixel = zeros(max(blobmax),1);
ypixel = zeros(max(blobmax),1);
arealist = zeros(max(blobmax),1);

for i=1:blobmax
    xpixel(i,1) = BlobStatistics(i).Centroid(1);
    ypixel(i,1) = BlobStatistics(i).Centroid(2);
    arealist(i,1) = size(find(Blobs==i),2);

end

Lastly, we need to be able to know what note it is, I sampled the range of which the notes can be found and assigned it to the its equivalent note. With the previous code, i also identify and obtained  the values of the centroids of each blobs. This centroids will be the identifier of the level or the pitch of the corresponding blob.
Figure 5. Sampled staff with the pixel range of different notes.

Figure 5, shows how we will know the pitch of the note. If its a C, D, or E, etc. So for this image
the red pixel = D which is around 65-68 pixels, E which is around 60-64 pixels, F which is around 54-58 and G = 49-42 pixels.
As you may notice, as we go higher  the staff (which means higher notes), the values of the centroids is decreasing. This is true because the y-pixel coordinate starts from 0 at the top.

That's it!!!. I guess all the problem has been covered. So as a summary, we identify the notes with the blob areas, while we identify the pitch of the note, with the blob centroids. 

All that is left now is to convert it to sound!. With the built-in function of scilab to produce sound, and save it, I implemented this code.

Convertion to Music 

//MusicPart

function n = note(f, t)
n = sin (2*%pi*f*t);
endfunction;

C = 261.63*2;
D = 293.66*2;
E = 329.63*2;
F = 349.23*2;
G = 392.00*2;
A = 440.00*2;
B = 493.88*2;
C1 = 523.25*2;
D1 = 587.33*2;
E1 = 659.26*2;
F1 = 698.46*2;
G1 = 783.99*2;



ypixel(find(ypixel >49 & ypixel< 52)) = G;
ypixel(find(ypixel >43 & ypixel< 47)) = A;
ypixel(find(ypixel >38 & ypixel< 41)) = B;
ypixel(find(ypixel >33 & ypixel< 37)) = C1;


ypixel(find(ypixel >31 & ypixel< 28)) = D1;
ypixel(find(ypixel >26 & ypixel< 23)) = E1;
ypixel(find(ypixel >21 & ypixel< 18)) = F1;
ypixel(find(ypixel >16 & ypixel< 11)) = G1;

arealist(find(arealist<70)) = 1.0;
arealist(find(arealist>70)) = 0.5;

BlobDetails = [xpixel,ypixel,arealist]

notelist = []
for j=1:length(blobmax)
    notelist($+1,:) = note(BlobDetails(j,2),t);
    end
s = matrix(notelist, 1, length(notelist));

With these code, I obtained all the necessary details, with their corresponding conversion in the musical scale.

BlobDetails

xpixel coordinate(order)   note(λ)     t (sec)
14.03370786516854 784.0 0.5
93.04494382022472 784.0 0.5
188.72826086956522 880.0 0.5
298.13793103448273 784.0 0.5
406.95652173913044 1046.5 0.5
532.9666666666667 987.76 1.0
718.7176470588236 784.0 0.5
797.9333333333333 784.0 0.5

The x pixel coordinate just tells you which note will be played first, basically, its already in order. The second column is the Note frequency which is obtained from the reference [2], and the third column is the distinction of the half note and the eight note, so I assigned a value of 0.5 for the eight notes and 1 second for the half note.
So yeah! Were done!! so try to identify the sound I produced. Im sure your familiar with it!. Anyway, i save this using the following code and uploaded it on the web, so you can play it.

savewave('C:\Users\Jazz\Desktop\Notes\happybirthday.mp3',s1)

ENJOY!
Here is the link of the sound, please comment if you have problems accessing the sound.
https://soundcloud.com/jazzlisten-1/ap-186
--------
I forgot to say this, but I have done this for all the staff on the original scoresheet, so its a complete tune, I hope it resembles the original. Hahahaha

References:
[1] AP 186 Handouts. Activity 12 - Playing Notes by Image Processing. Maricor Soriano 2013
[2] http://www.phy.mtu.edu/~suits/notefreqs.html
-----------------------------------------------------------------------------------------------------

I will give myself a 11/10 if you can identify the sound i produced :). I guess that's the main point of this activity.

No comments:

Post a Comment