Analyze Song Bpm In Reaper
Posted By admin On 05.01.21Free online Tap BPM tool allows you to calculate tempo and count Beats Per Minute (BPM) by tapping any key to the rhythm or beat. Tap for a few seconds to quickly calculate BPM without waiting the whole minute. You may optionally configure it for Beats Per Second (BPS) or Beats Per Hour (BPH). Counting the meter of your music manually is a drag. How do I get Reaper to match tempo to a song I import? Tap on the bpm edit box for a few bars,. Select a few bars of the song, right click on the timeline and choose Set project tempo from time selection (new time signature). Apr 07, 2020 (I have the Reaper licensed version and it's up to date.) In Reaper, I want to change an existing wav file track from its current 163bpm to 170bpm. I've tried just reseting the Project Settings in Project BPM (as a test) test 50.000 and 200.000, Timebase for items/env/markers to Time, and the next Timebase for tempo/time sig envelope to Time. Recently, I've been recording songs in Reaper that go through multiple tempos. Is it possible in Reaper to have different tempos for the song at At different points. Like at 1:20, it's 205 BPM, then at 2:40, it's 170 BPM (just an example).
Reaper Metronome
I have always wondered whether it would be possible to detect the tempo (or beats per minute, or BPM) of a piece of music using a neural network-based approach. After a small experiment a while back, I decided to make a more serious second attempt. Download ultramixer 5 full crack. Here’s how it went.
Approach
Initially I had to throw around a few ideas regarding the best way to represent the input audio, the BPM, and what would be an ideal neural network architecture.
Input data format
But what do you do if your Reaper project is set to 120 beats per minute (BPM), but you really want to use a loop that is set to 130 BPM? Do you have to edit the loop to make it faster and then bring it into your song? Or is it just not possible to use that loop with your song? Thankfully, the answer to both of those questions is a resounding NO! Recently, I've been recording songs in Reaper that go through multiple tempos. Is it possible in Reaper to have different tempos for the song at At different points. Like at 1:20, it's 205 BPM, then at 2:40, it's 170 BPM (just an example).
One of the first decisions to make here is what general form the network’s input should take. I don’t know a whole lot about the physics side of audio, or frequency data more generally, but I am familiar with Fourier analysis, and spectograms.
I figured a frequency spectogram would serve as an appropriate input to whatever network I was planning on training. These basically contain time on the x-axis, and frequency bins on the y-axis. The values (pixel colour) then indicate the intensity of the audio signal at each frequency and time step.
An example frequency spectogram from a few seconds of electronic music. Note the kick drum on each beat in the lowest frequency bin.
Output data format (to be predicted by the network)
I had a few different ideas here. First I thought I might try predicting the BPM directly. Then I decided I could save the network some trouble by having it try to predict the location of the beats in time. The BPM could then be inferred from this. I achieved this by constructing what I call a ‘pulse vector’ as follows:
Say we had a two second audio clip. We might represent this by a vector of zeroes of length 200 - a resolution of 100 frames per second.
Then say the tempo was 120 BPM, and the first beat was at the start of the clip. We could then create our target vector by setting (zero-indexed) elements [0, 50, 100, 150] of this vector to 1 (as 120 BPM implies 2 beats per second).
We can relatively easily infer BPM from this vector (though its resolution will determine how accurately). As a bonus, the network will also (hopefully) tell us where the beats are, in addition to just how often they occur. This might be useful, for instance if we wanted to synchronise two tracks together.
This image overlays the target output pulse vector (black) over the input frequency spectogram of a clip of audio.
Neural network architecture
My initial architecture involved just dense layers. I was working in Lasagne. I soon discovered the magic of Keras however, when looking for a way to apply the same dense layer to every time step. After switching to Keras, I also added a convolutional layer. So the current architecture is essentially a convolutional neural network. My intuition behind the inclusion and order of specific network layers is covered further below.
Creating the training data
The main training data was obtained from my Traktor collection. Traktor is a DJing program, which is quite capable of detecting the BPM of the tracks you give it, particularly for electronic music. I have not had Traktor installed for a while, but a lot of the mp3 files in my music collection still have the Traktor-detected BPM stored with the file.
I copied around 30 of these mp3’s to a folder, however later realised that they still needed a bit more auditing - files needed to start exactly on the first beat, and needed to not get out of time throughout the song under the assumed BPM. Therefore I opened each in Reaper (a digital audio workstation), chopped each song to start on exactly the first beat, ensured they didn’t go out of time, and then exported them to wav.
Bpm Analyzer
Going from mp3/wav files to training data is all performed by themp3s_to_fft_features.py
script.
~I then converted1 these to wav and read them into Python (using wavio). I also read the BPM from each mp3 into Python (using id3reader).~
-> I now already already have the songs in wav format, and the BPMs were read from the filenames, which I manually entered.
The wav is then converted to a spectogram. This was achieved by:
- Taking a sample of length
fft_sample_length
(default 768) everyfft_step_size
(default 512) samples - Performing a fast fourier transform (FFT) on each of these samples
The target pulse vector matching the wav’s BPM is then created using the function get_target_vector
.
Then random subsets of length desired_X_time_dim
are taken in pairs from both the spectogram and target pulse vector. By this, we generate lots of training inputs and outputs that are a more manageable length from just the one set of training inputs. Each sample represents about 6 seconds of audio, with different offsets for where the beats are placed (so our model has to predict where the beats are, as well as how often they occur).
For each ~6 second sample, we now have a 512x32 matrix as training input - 512 time frames and 32 frequency bins (the number of frequency bins can be reduced by increasing the downsample
argument) - and a 512x1 pulse vector as training output.
In the latest version of the model, I have 18 songs to sample from. I create a training set by sampling from the first 13 songs, and validation and test sets by sampling from the last 5 songs. The training set contained 28800 samples.
Specifying and training the neural network
Network architecture - overview
As described above, I decided to go with a convolutional neural network architecture. It looked something like this:
An overview of the neural network architecture.
In words, the diagram/architecture can be described as follows:
The input spectogram is passed through two sequential convolutional layers
The output is then reshaped into a ‘time by other’ representation
Keras’ TimeDistributed Dense layers are then used (in these layers, each time step is passed through the same dense layer; this substantially reduces the number of parameters needed to be estimated)
Finally, the output is reduced to one dimension, and passed through some additional desnse layers before producing the output
Network architecture - details
The below code snippets give specific details as to the network architecture and its implementation in Keras.
First, we have two convolution layers:
I limited the amount of max-pooling. Max-pooling over the first dimension would reduce the time granularity, which I feel is important in our case, and in the second dimension we don’t have much granularity as it is (just the 32 frequency bins). Hence I only performed max pooling over the frequency dimension, and only once. I am still experimenting with the convolutional layers’ setup, but the current configuartion seems to produce decent results.
I then reshape the output of the convolution filters so that we again have a ‘time by other stuff’ representation. This allows us to add some TimeDistributed
layers. We have a matrix input of something like 512x1024 here, with the 1024 representing the outputs of all the convolutions. The TimeDistributed
layers allow us to go down to something like 512x256, but with only one (1024x256) weight matrix. This dense layer is then used at all time steps. In other words, these layers densely connect the outputs at each time step to the inputs in the corresponding time steps of the following layer. The overall benefit of this is that far fewer parameters need to be learned.
The intuition behind this is that if we have a 1024-length vector representing each time step, then we can probably learn a useful representation at a lower dimension of that time step, which will get us to a matrix size that will actually fit in memory when we try to add some dense layers afterwards.
Finally, we flatten everything and add a few dense layers. These simultaneously take into account both the time and frequency dimensions. This should be important, as the model can try to incorporate things like the fact that beats should be evenly spaced over time.
Results
Usually the model got to a point where validation error stopped reducing after 9 or so epochs.
With the current configuration, the model appears to be able to detect beats in the music to some extent. Note that I’ve actually switched to inputs and outputs of length 160 (in the time dimension), though I was able to achieve similar results on the original 512-length data.
This first plot shows typical performance on audio clips within the training set:
Predicted (blue) vs actual (green) pulses - typical performance over the training set.
Performance is not as good when trying to predict pulse vectors derived from songs that were not in the training data. That said, on some songs the network still gets it (nearly) right. It also often gets the frequency of the beats correct, even though those beats are not in the correct position:
Predicted (blue) vs actual (green) pulses - typical performance over the validation set.
If we plot these predictions/actuals over the input training data, we can compare our own intuition to that of the neural network:
Predicted (black) vs actual (white) pulses plotted over spectogram - typical performance over the training set.
Take this one validation set example. I would find it hard to tell where the beats are by looking at this image, but the neural net manages to figure it out at least semi-accurately.
Predicted (black) vs actual (white) pulses plotted over spectogram - typical performance over the validation set.
Next steps
This is still a work in progress, but I think the results show far have shown that this approach has potential. From here I’ll be looking to:
Use far more training data - I think many more songs are needed for the neural network to learn the general patterns that indicate beats in music
Read up on convolutional architectures to better understand what might work best for this particular situation
An approach I’ve been thinking might work better: adjust the network architecture to do ‘beat detection’ on shorter chunks of audio, then combine the output of this over a longer duration. This longer output can then serve as the input to a neural network that ‘cleans up’ the beat predictions by using the context of the longer duration
I still need to clean up the code a bit, but you can get a feel for it here.
Random other thoughts
I first thought of approaching this problem using a long-short term memory (LSTM) network. The audio signal would be fed in frame-by-frame as a frequency spectogram, and then at each step the network would output whether or not that time step represents the start of a beat. This is still an appealing prospect, however I decided to try a network architecture that I was more familiar with
I tried a few different methods for producing audio training data for the network. For the proof-of-concept phase, I created a bunch of wav’s with just sine tones at varying pitches, decaying quickly and played only on the beat, at various BPM’s. It was quite easy to get the network to learn to recognise the BPM from these. A step up from this was taking various tempo-synced break beats, and saving them down at different tempos. These actually proved difficult to learn from - just as hard as real audio files
It might be also interesting to try working with the raw wav data as the input
Footnotes
1: In the code, the function convert_an_mp3_to_wav(mp3_path, wav_path)
tells Linux to use mpg123 to convert the input mp3_path
to the output wav_path
. If you are on Linux, you may need to install mpg123. If you are using a different operating system, you may need to replace this with your own function that converts the input mp3 to the output wav.
Tempo mapping in Reaper can be quick and easy — so switch off that click and get playing!
If you're starting your production with live audio recordings, but want to add MIDI parts or sampled loops later on, it often makes sense to ask the performers to play along to a programmed click track (or simply Reaper's internal metronome) during the tracking sessions. That way, their performances should line up nicely with Reaper's bars/beats grid. However, it's not always possible (or, indeed, musically desirable!) to use a click track, so it's handy, on occasion, to know how to synchronise the grid to a free‑running recorded performance after the fact. This kind of tempo mapping can be quite laborious, though, so in this month's Reaper workshop I'm going to suggest a simple approach that gets the job done efficiently.
Essential Preparations
The first thing you need to do is select all the Items in your project (Ctrl‑A), open the Item Properties dialogue window (press F2 and click the subsequent 'All At Once' button), and select 'Time' from the Item Timebase pull‑down menu. This prevents any Items from time stretching during the tempo‑mapping process.
Right‑click the Locking button in the main toolbar and set it up so that there are ticks in both the Enable Locking and Items (Prevent Left/Right Movement) boxes. This stops any audio shifting around while you're extracting the tempo information. If your initial recording is in multitrack format, it also makes sense to Group all those Items together for the time being (select them all and type 'G'). However, for the sake of simplicity I'll assume that we're working with a single mono/stereo audio Item.
Now open the Actions window by typing '?' (Shift‑'/') and check that the following are all assigned to convenient locations on your keyboard:
Slice & Dice
A big danger is trying to map the tempo in too much detail, because that tends to leave you with lumpy‑sounding MIDI parts. Much better to work in terms of bar‑long chunks instead, so, first of all, play your project from the start and 'Split Items At Play Cursor' ('S' by default) in real time on each bar's downbeat.'Split Items At Play Cursor' has been used on each bar's downbeat while the project is playing through. This makes the project easier to work on, bar by bar.The more accurately you can do this, the easier the fine editing work will be later.
When you've finished, zoom in on your first slice point, switch off Snap, using Alt‑S, and align the slice point with the downbeat's waveform as accurately as possible.Clicking this icon switches Snap on and off.Drag on the slice point to adjust the boundaries of both Items simultaneously without creating gaps or overlaps; the trim cursor should show a double‑headed horizontal arrow if you're doing it right.When adjusting boundaries simultaneously, the trim cursor should show a double‑headed horizontal arrow.As a safety measure against inadvertent overlaps, activate the main toolbar's Auto‑crossfade button so that they become visually more obvious. When you're done with the first slice point, make sure the track is selected and use 'Select And Move To Next Item' to bring up the next slice point without changing your zoom setting.
Don't worry if you're not sure exactly which bit of the signal waveform actually constitutes the downbeat; just guess as best you can. However, if there's no downbeat event at all in the audio for that bar, select the Items on both sides of the slice and 'Heal Splits In Items'.
Building The Tempo Map
Once you've got all your slice points in their 'best guess' positions, you're ready to start building the tempo map:
If all your sliced Items are single bars, you can generate the tempo map by simply selecting the first Item and repeating the following sequence of Actions: Set Time Selection To Items, Create Measure From Time Selection (Detect Tempo) and Select And Move To Next Item.
For any sliced Item that is more than a bar long, use Create Measure From Time Selection (Detect Tempo, New Time Signature) instead of Create Measure From Time Selection (Detect Tempo) in the Action sequence, and fill in the appropriate number of bars in the dialogue box. You may also need to do this for single‑bar Items if you have a time‑signature change mid‑project. (You jazzer, you!)
Analyze Song Bpm In Reaper Download
Fine Tuning The Grid
You should now have a little tempo marker in the main ruler at each slice point, but the trickiest bit of the process is still to come: adjusting the results by ear. The first thing I do is switch on Reaper's metronome click (the leftmost button in the main toolbar), insert the Jesusonic Time Adjustment plug‑in on the sliced track, and then adjust the delay amount to get the best overall sense of 'lock' between the click and the performance. The click will inevitably drift a bit against the audio from time to time, though, so don't obsess too much over this yet, just listen for the best fit so that an overall timing offset doesn't fool you into wasting bags of time on detailed tweaks.
Reaper Bpm
If the click appears to be rushing in a section, that Item is probably too short, whereas a dragging click suggests an over‑long Item. Whichever it is, just grab the offending slice boundary and adjust it a little. Now delete not only the tempo marker above the adjusted boundary, but also the previous one, and recalculate both of them to reflect the new Item lengths. This should maintain the correct positions of all subsequent tempo markers. If the tempo map is still out of sync, repeat the process again: there's no substitute for trial and error at this juncture.
One common thing to look for is where there's a short‑term tempo fluctuation over two bars: the first bar's tempo is unusually high and the second bar's tempo is unusually low (or vice versa). This often stems from the performer having just misplaced the downbeat note, and you may find you get smoother tempo‑mapping results in that case by healing those two bars together and recalculating the map for the longer section. Of course, you could equally well slice things up more finely if you feel the tempo changes need to be more fluid, although that's very rarely necessary in my experience. Whatever you decide, do make a point of listening for at least two bars before any section that you're adjusting, so that you can get your ear 'into the zone'. Sensations of drifting timing can be very subtle, and are very dependent on the listening context, so this simple precaution typically saves a lot of tail chasing.
When you've completed a map you're happy with, it's as well to double‑check that you've not inadvertently moved a sliced Item. Select all the Items and use Heal Splits In Items to remove all the slices. If any remain, then something's moved and you'll have to backtrack. The Locking button should have prevented this, but if not, it's better to remedy things straight away.
An Alternative Method
An alternative time‑mapping method is to 'Insert Marker At Current Position' for each downbeat: use Go To Next/Previous Marker and Set Start/End Point to create the time‑selections that 'Create Measure From Time Selection' needs to operate. However, while this approach works fine, I personally prefer the slicing method, because I rely heavily on Markers for general timeline navigation purposes and I like the convenience of using Reaper's Item colouring to track my progress.
Reaper Randomize Tempo
- Select And Move To Next/Previous Item.
- Split Items At Play Cursor.
- Heal Splits In Items.
- Set Time Selection To Items.
- Create Measure From Time Selection (Detect Tempo), which is Alt‑Shift‑C by default).
- Create Measure From Time Selection (Detect Tempo, New Time Signature).
- Select the Item containing the first downbeat and Set Time Selection To Items.
- Right‑click in the main time ruler and choose 'Set Project Tempo From Time Selection' from the context‑sensitive menu. To find 'Set Project Tempo From Time Selection', right‑click in the main time ruler.
- Fill in the dialogue box's Time Signature and Bars fields and hit Return to match Reaper's global tempo setting to that of the first bar of audio.
- Switch Snap back on and select all Items in the project.
- De‑activate the Locking button and then drag the same 'first‑downbeat' Item to bar nine, taking everything else along for the ride. This gives you a sensible eight‑bar count‑in for the sake of any later overdubbing. Here we can see the 'first‑downbeat' Item that's been dragged to bar nine. In the context of the project, this gives an nice eight‑bar count-in for the sake of later overdubbing.
- Switch Locking back on.