[ot][notes] how to automatically tag actions in video data
just tried this; it isn't hard. i followed from https://mmaction2.readthedocs.io/en/1.x/get_started.html . i pursued this toolkit because it is used by the major new research models (although they aren't in it quite yet). 1. install mmengine and mmcv using mim pip3 install -U openmim mim install mmengine 'mmcv>=2.0.0rc1' 2. install mmaction2 from source (to quickstart with the demo data) git clone https://github.com/open-mmlab/mmaction2.git cd mmaction2 git checkout 1.x # or dev-1.x for cutting edg pip3 install -v -e . 3. download an action recognition model. i believe this model (tsn) is relatively old (2018?), but it is their example, and it is the one i just tried successfully. mim download mmaction2 --config \ tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb --dest . 4. use the demo script and the downloaded model to select the top 5 labels from a list for an mp4 video. note: add --device=cpu if you want to run outside nvidia cuda. python demo/demo.py tsn_r50_8xb32-1x1x8-100e_kinetics400-rgb.py \ tsn_r50_8xb32-1x1x8-100e_kinetics400-rgb_20220818-2692d16c.pth \ demo/demo.mp4 tools/data/kinetics/label_map_k400.txt - the .py is the model config - the .pth is the model weights, larger file - the .txt appears to be a list of labels for it to choose from - i tried a different .mp4 and it seemed to work despite not normalizing the resolution, which seems quite surprising to me I don't know how to do this "properly" or if all the above is correct, but it worked for me. I like to store information once I have successes because I never know when I might dissociate away. The next new cutting edge model for doing this is InternVideo, one url for which is https://github.com/OpenGVLab/InternVideo . The weights were just released publicly 2 days ago, although I'm still waiting on confirmation to access the google drive myself. It's set up based on mmaction. It looks like the latest model available in stable mmaction is videoswin (2022); i have not tried it myself at this time. The surrounding framework can also perform many, many other tasks than simple action recognition, such as plotting human skeleton vertices or filling in missing video data.
to find better models (again, i don't know if this is the "proper" way): 1. mim can only download models that are associated with installed packages, so to find models from the dev-1.x branch, it must be installed git checkout dev-1.x pip3 install -e . 2. use the search command to find the best performing model with an installed config. note: there are also tiny-ram variants of models, for low end systems, which don't perform as well mim search mmaction2 --sort kinetics-700/top_1_accuracy --descending 3. for dev-1.x, the model performing best on kinetics-700 is presently swin-large-p244-w877_in22k-pre_16xb8-amp-32x2x1-30e_kinetics700-rgb , which is a 782MB download. for stable 1.x, the model that comes up is slowonly_imagenet-pretrained-r50_16xb16-8x8x1-steplr-150e_kinetics700-rgb , although i had to switch to dev-1.x to get the download to succeed, maybe the url changed. mim download mmaction2 --config \ swin-large-p244-w877_in22k-pre_16xb8-amp-32x2x1-30e_kinetics700-rgb --dest . 4. use new model to run demo (maybe?) python3 demo/demo.py \ swin-large-p244-w877_in22k-pre_16xb8-amp-32x2x1-30e_kinetics700-rgb.py \ swin-large-p244-w877_in22k-pre_16xb8-amp-32x2x1-30e_kinetics700-rgb_20220930-f8d74db7.pth \ demo/demo.mp4 tools/data/kinetics/label_map_k400.txt --device=cpu # crashes for me; i don't have the ram on my old system for the large model sha256sums: 2692d16c712e24994aaa3cfb48f957a521e053ffb81c474e2c0b3e579c888650 tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb_20220906-2692d16c.pth 690bdf66675aa117e37298de8feb44112f80cc0fecd1290b21a4cfc1433d26b4 tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb.py adf57e1194172fe30ee1d44715aa820915f7586997e775aa975d0e7c24de4342 slowonly_imagenet-pretrained-r50_16xb16-8x8x1-steplr-150e_kinetics700-rgb_20220901-4098e1eb.pth c1640f2a3ae2e962a9a9f8ea2c370c4f34c37f78d96fd77a852239f5587ab619 slowonly_imagenet-pretrained-r50_16xb16-8x8x1-steplr-150e_kinetics700-rgb.py f8d74db74207c862162ba47b6c9296c0a1641f23621f9dd47555e8d8e56b7494 swin-large-p244-w877_in22k-pre_16xb8-amp-32x2x1-30e_kinetics700-rgb_20220930-f8d74db7.pth 84239680a94b874841c67bdc49739799bac0077437e1578960ddef598a8a3cb3 swin-large-p244-w877_in22k-pre_16xb8-amp-32x2x1-30e_kinetics700-rgb.py
i typed an email and it went away the last model from last email was corrupt, download is broken. here are checksums for a large model and smaller swin models: c7a3cc5bdd743507e44b4254b81621bf6ac0253d52506a7569d24bef03ad0528 ircsn_ig65m-pretrained-r152_8xb12-32x2x1-58e_kinetics400-rgb_20220811-c7a3cc5b.pth 7cb0556d8147be647601e6fa3ac7175f0d59e50bb6be66fd61e9ad964b767cc4 ircsn_ig65m-pretrained-r152_8xb12-32x2x1-58e_kinetics400-rgb.py e91ab986e24280425f1b5ea0851e4a9c194b6409bf12772d47761542d1bb53a3 swin-small-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb_20220930-e91ab986.pth 0c03449f4e7476f4c0b3307e0eb440e4c3eb1dec5efdeeee93d5d97955434840 swin-small-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb.py 241016b2a95f8cf039fea2948be1fe5889b0ed319cda7f4e5b7dbd5ac2b89851 swin-tiny-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb_20220930-241016b2.pth a6ba5eaf50f46808451c6c63eb3c6aff3b1aa280684249e7c45fa53115526350 swin-tiny-p244-w877_in1k-pre_8xb8-amp-32x2x1-30e_kinetics400-rgb.py i did not try swin-small. both swin-tiny and ircsn gave 99% accuracy for the arm wrestling example, much better than the 2018 model. ircsn was very very slow. swin-tiny is still likely not quite fast enough to perform live analysis on my system without something added, like a framerate drop.
participants (1)
-
Undescribed Horrific Abuse, One Victim & Survivor of Many