[data science] CVPR 2022: Robot Learning Tutorial (Vision, Text)

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Mon Jun 20 10:23:39 PDT 2022

Tutorial Part 2

Under Socratic Models, the code.

For using a language model to select steps, the model is prompted with
a sequence of instruction code. The instructions are comments, and a
number of diverse examples are given.

[this sequence looks to me like the result of manual prompt tuning:
trying things and seeing what kinds of prompts produce good results]

They did this tuning in the OpenAI playground (available in the public
openai beta). It's just language model prompting.

[It's notable that they don't include examples of failure conditions,
leaving that to other handling. This lets them stuff more examples
into the relatively short prompt, which produces better performance.]

Having already provided sets of named objects for the model, GPT can
instruct it by passing words in its pretraining dataset to the robot's
method functions.

- Recommends increasing trains steps above the hardcoded default of
4000 in the colab, for better performance.

ViLS is used to generate descriptions of the objects automatically;
this is fed as context [likely prompt material] to the language model.

Their example actually generates the expected language instructions,
with its control method, and passes it on to be reparsed. There's
mention of improved performance.

The demonstrator comes up with a novel prompt, "place the blocks into
mismatched bowls", which the language model crafts 2 steps for that
match the meaning; they wanted 3 steps, something went as they didn't

Limitations For Creative People to Address
- Scene representation is naive: more fine-grained attributes, more
complex spacial relationships (doesn't like "pick the thing that's on
top of", more complex objects?
- Bounded perception outputs: image -> rich description, instead of
image -> object labels -> scores
- No closed-loop feedback to replan, or to handle when the models miss
parts of the scenario
- Bounded planning outputs: how to learn new low-level skills?
combination of known primitives can seem limited

Says lots more things will come out in coming months.

Slides are on website. Talk was recorded.

I likely won't note the rest of this this way right now.

More information about the cypherpunks mailing list